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FOREWORD 


Army  training  developers  need  tools  to  aid  in  the  design, 
acquisition,  and  use  of  simulation-  and  computer-based  programs 
of  instruction  for  weapon  operation  and  maintenance.  One 
critical  need  is  a  job  aid  for  the  design  and  evaluation  of 
training  devices  during  all  stages  in  the  weapon  acquisition 
cycle. 

This  series  of  three  reports  describes  one  approach  to  such 
aiding— a  hybrid  of  decision  analysis  and  mathematical  modeling. 
The  approach  provides  numerical  estimates  of  device  effective¬ 
ness  which  are  based  on  expert  ratings  of  trainee  and  task 
characteristics,  functional  and  physical  similarity  between 
the  proposed  device  and  the  operational  equipment,  and  the 
instructional  characteristics  of  the  device.  It  is  an  analytic, 
computer-based  technique— a  menu-driven  system— which  can  be 
used  at  any  stage  of  training  device  design. 

The  product  of  this  research  can  help  training  device 
procurers  such  as  PM-TRADE  and  training  developers  in  TRADOC 
make  better  documented  decisions  about  training  device  design. 
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Forecasting  Device  Effectiveness 
EXECUTIVE  SUMMARY 


Requirement: 

To  develop  a  conceptual  framework  and  methodology  for 
predicting  the  effectiveness  of  a  training  device  or 
simulator;  to  analyze  and  summarize  training  device  evalua¬ 
tion  issues  including  criteria  of  training  effectiveness, 
variables  that  influence  effectiveness,  and  constraints 
that  affect  device  evaluation  in  either  its  empirical  or 
rational  form. 

Procedure: 

A  literature  review  was  conducted  and  the  process  of 
acquiring  training  devices  within  the  Life  Cycle  System 
Management  Model  was  analyzed.  Theoretical  and  practical 
issues  of  training  device  design,  development,  and  evalua¬ 
tion  were  investigated.  Results  were  used  to  construct  a 
conceptual  framework  within  which  to  develop  a  procedure 
for  predicting  device  effectiveness. 

Findings: 

Training  device  evaluation  can  be  viewed  within  the 
more  general  context  of  a  program  evaluation  rationale. 

This  model  consists  of  a  network  of  hypotheses  that  relate 
program  inputs  and  activities  to  a  series  of  intermediate 
outcomes  that  also  are  logically  linked.  The  model 
provides  for  multiple  criteria  of  training  effectiveness. 
These  include  skill  acquisition,  transfer  of  training,  and 
efficiency  of  training  and  transfer.  The  model  also 
provides  for  several  different  classes  of  variables  that 
hypothetically  may  influence  effectiveness.  In  both  of 
these  respects,  the  conceptual  framework  is  superior  to 
earlier  models  that  have  been  more  narrowly  focused. 

Utilization  of  Findings: 

An  analytic  method  for  forecasting  training  device  ef¬ 
fectiveness  can  be  developed  from  the  conceptual  framework 
described  in  this  report.  Such  forecasts  are  of  value 
during  the  device  acquisition  process  when  opportunities  to 
conduct  empirical  research  and  evaluation  are  severely 
1 imi ted . 
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FORECASTING  DEVICE  EFFECTIVENESS: 


I.  ISSUES 

1.  Introduction 

This  report  is  submitted  in  partial  fulfillment  of 
Contract  MDA  903-82-C-0414  between  the  U.S.  Army  Researeh 
Institute  (ARI)  and  the  American  Institutes  for  Research 
(AIR) .  It  is  part  of  a  programmatic  effort  to  develop  and 
analytically  evaluate  a  model  designed  to  forecast  training 
device  effectiveness.  This  report,  the  first  of  a  series, 
discusses  a  number  of  issues  that  bear  on  the  development 
of  formal  analytic  methods  for  predicting  the  potential  ef¬ 
fectiveness  of  alternative  device  designs.  The  discussion 
encompasses  theoretical ,  practical,  and  methodological  is¬ 
sues  uncovered  during  our  review  of  the  literature  and 
analysis  of  the  problem. 

Background 

The  Army  relies  on  training  devices  and  simulators  as 
indispensible  components  of  performance-based  training. 
Devices  can  be  designed  to  incorporate  instructional 
features  that,  for  example,  provide  for  control  of 
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feedback,  repetition  of  exc  r<' i  ;;^s ,  freeze  and  playback,  and 
adaptive  sequencing  of  instruction;  these  features  are 
associated  with  specialized  hardware  and  software  that  are 
not  typically  available  on  the  parent  equipment.  Likewise, 
devices  are  often  safer,  more  available,  and  cheaper  to  use 
than  operational  parent  equipment. 

To  support  the  acquisition  of  cost-effective  training 
devices,  the  Army  has  formalized  a  four-phase  process  that 
is  linked  to  the  Life  Cycle  System  Management  Model  (LCSMM) 
of  the  parent  material  system  (Carroll,  Rhode,  Skinner, 
Mulline,  Friedman,  &  Franco,  1980?  CORADCOM,  1980;  Kinton, 
1980;  Kane,  1981).  Kane  and  Holman  (1982)  provide  an 
idealized  description  of  the  four  phases  of  device  acquisi¬ 
tion  and  the  corresponding  hardware  development  cycles. 

in  each  successive  phase  of  acquisition,  training 
device  design  decisions  presumably  are  baaed  on  more 
detailed  and  precise  information  about  the  training 
requirement  to  be  met,  the  physical  and  functional  charac¬ 
teristics  of  the  device  needed  to  satisfy  that  requirement, 
the  manner  in  which  the  device  will  be  utilized,  its  effec¬ 
tiveness  and  its  coat.  The  intent  of  the  many  steps  in  the 
formal  acquisition  process  is  to  insure  that  the  initial 
and  often  vague  training  concept  is  translated  into 


2 


cost-effective  training  equipment  that  troops  eventually 
interact  with,  at  school  or  in  the  field.  The  great  appeal 
of  a  highly  structured  acquisition  process  is  that  its  many 
phases  and  steps  are  conceptually  coherent,  promising  a 
procedure  for  systematically  raising  and  then  empirically 
resolving  training  device  design  issues. 

In  practice,  however,  unavoidable  logistical  demands 
in  the  training  device  acquisition  process  and  the  LCSMM 
that  supports  it  make  implementation  in  its  idealized  form 
impossible.  As  a  consequence,  the  design  of  cost-effective 
training  devices  continues  to  be  fraught  with  difficulty. 
For  example,  constraints  in  the  acquisition  schedule  im¬ 
posed  by  development  of  the  parent  system  often  preclude 
empirical  evaluations  during  the  design  and  development 
process;  if  such  an  evaluation  is  conducted,  for  example  at 
Operational  Test  (OT)  I  or  OT  II,  it  is  usually  too  late  in 
the  acquisition  process  to  modify  device  design  based  on 
the  evaluation  results.  As  a  necessary  consequence,  ap¬ 
praisals  of  a  particular  design  or  of  competing  design  al¬ 
ternatives  are  primarily  analytic. 

However,  for  several  reasons  --  lack  of  reliable  and 
valid  analytic  tools,  paucity  of  applicable  research,  etc. 
--  formal  analytic  procedures  are  inadequate  or 


nonexistent.  The  bases  on  which  device  design  decisions 
are  made  have  not  been  clearly  articulated,  nor  is  it  clear 
what  types  and  levels  of  data  are  needed  to  support  each 
decision.  Thus,  there  is  a  need  for  analytic  procedures, 
applicable  during  both  early  and  later  stages  of  device  ac¬ 
quisition,  that  permit  prediction  of  the  potential  effec¬ 
tiveness  of  alternative  device  designs. 

To  date,  only  a  handful  of  analytic  methods  and  models 
have  been  developed  that  attempt  to  evaluate  or  predict  the 
effectiveness  of  training  devices.  Most  of  these  have 
emerged  from  a  program  of  research  sponsored  by  ARI ,  The 
objective  of  these  efforts  has  been  to  develop  methods  to 
forecast  transfer  of  training  based  on  information  about 
training  device  characteristics.  There  have  been  several 
recent  reviews  of  these  methods  (e.g.,  Tufano  &  Evans, 

1982;  Harris  &  Ford,  1983;  Knerr,  Nadler,  &  Dowell,  1983). 
We  will  not  repeat  these  reviews  here;  rather,  we  will  sum¬ 
marize  the  limitations  that  one  or  more  of  these  reviews 
have  remarked  upon. 

e  Hone  of  the  methods  has  been  satisfactorily 
validated  empirically* 

—  Virtually  no  empirical  studies  have  been 
attempted ; 


-*  ft  "criterion  problem"  of  whet  to  measure 
and  how  to  measure  performance  has  limited 
the  evaluation  of  the  methods; 

—  In  many  cases,  it  is  not  feasible  to 
measure  operational  performance  on  the 
parent  equipment! 

e  The  models  have  too  narrow  a  focus: 

--  Extra-device  variables  (e.g.,  utilization, 
student  and  instructor  acceptance,  student 
capabilities,  etc.)  have  not  been 
included; 

--  Device  and  system  characteristics  affecting 
learning  have  not  been  considered; 

—  Models  have  not  addressed  such  issues  as 


criticality  or  importance  of  training. 

e  The  models  have  been  inefficient  to  apply: 

--  The  few  that  have  been  developed  consist  of 
tedious,  manual,  paper-and-penci 1 
procedures ; 

--  They  provide  a  microscopic  level  of 
analysis. 


e  The  models  are  of  limited  diagnostic  utility! 
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--  They  arbitrarily  aggregate  judgmental  data, 
thereby  producing  relatively  unin¬ 
terpretable  summary  indexes; 

--  Algorithms  and  rationales  for  decisions 
based  on  obtained  indexes  are  arbitrary  or 
not  specified. 

Recognizing  these  limitations,  ARI  has  sponsored  the 
current  project,  the  major  objective  of  which  is  to  build 
upon  previous  efforts  and  overcome  their  shortcomings.  In 
support  of  this  effort,  AIR  reviewed  literature  and  conduc¬ 
ted  conceptual  analyses  to  examine  the  utility  of  transfer 
as  n  dependent/criterion  variable,  explored  alternatives 
and  supplements  to  transfer  for  assessing  device  effective¬ 
ness,  and  ascertained  variables  hypothetically  affecting 
various  effectiveness  criteria.  Based  on  our  findings,  we 
provided  recommendations  for  alternative  or  supplemental 
criterion  measures,  for  modifications  of  ARl's  ADP-oased 
effectiveness  forecast  system,  and  for  additional  research. 

Organisation  of  This  Report 

This  report  is  organized  around  several  issues  related 
to  the  evaluation  of  effectiveness.  For  each  major  issue, 
we  address  a  number  of  questions,  present  various 
arguments,  and  attempt  some  resolutions  where  appropriate. 


T.n  the  following  chapter,  we  discus?  two  fundamental 
theoretical  issues.  First,  what  actually  do  we  mean  by  the 
term  "device  effectiveness?"  That  is,  what  should  be  the 


criterion  of  device  effectiveness  and  how  should  it  be 
measured?  In  this  latter  connection,  we  address  the  follow¬ 
ing  questions*  What  is  transfer  of  training?  How  is  it 
measured?  What  are  the  pros  and  cons  of  its  use  as  a 
measure  of  device  effectiveness?  What  are  the  alternatives 
to  transfer  of  training  as  measures  of  effectiveness?  In 
this  regard  we  discuss  several  possibilities,  including  ac¬ 
quisition  of  skills  and  knowledge,  acquisition  efficiency, 
and  other  concepts. 
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The  second  major  issue  concerns  the  "content"  of  an 
effectiveness  evaluation  model:  What  are  the  classes  and 
types  of  variables  that  hypothetically,  at  least,  influence 
device  ef f ectivenesi?  In  this  discussion,  we  introduce  a 
"program  evaluation  framework"  to  help  organize  these  vari¬ 
ables  and  to  aid  the  conceptualization  of  the  training  sys¬ 
tem  design  and  evaluation  problem. 


In  Chapter  3,  we  discuss  practical  and  methodological 
issues  related  to  real-world  constraints  on  developing  and 
evaluating  a  training  system  effectiveness  forecasting 
procedure.  Topics  include  the  impact  of  the  LCSMM, 


difficulties  of  criterion  measurement,  constraints  on 
statistical  techniques  used  in  evaluations,  and  limitations 
on  the  measurement  of  variables. 


2.  Theoretical  Issues 


Overview 

An  ideal  methodology  for  analytically  evaluating  (or 
forecasting)  the  effectiveness  of  a  training  device  or 
simulator  would  have  several  properties.  First,  in  accord 
with  the  existing  LCSMM,  it  would  be  applicable  at  dif¬ 
ferent  stages  of  device  design  and  development.  Second,  it 
would  be  diagnostic  --  it  would  indicate  which  device  fea¬ 
tures  contributed  to  effectiveness  and  which  ones  detracted 
from  it.  Third,  it  would  be  easy  to  use.  Fourth,  it  would 
support  different  levels  and  types  of  decisions  (e.g., 

"Will  Device  1  shorten  skill  acquisition  time  on  the  opera¬ 
tional  equipment?"  "Is  Device  1  more  cost-effective  than 
the  alternative  designs?"). 

When  contemplating  development  of  a  method  for 
evaluating  devices  one  immediately  encounters  two  fundamen¬ 
tal  sets  of  concerns.  First,  what  actually  do  we  mean  when 
we  say  that  a  device  is  "effective?"  What  would  be  our 
criterion  of  device  effectiveness  and  how  would  we  measure 
it?  Second,  what  would  be  the  content  of  our  forecasting 
method?  What  are  the  classes  and  types  of  variables  that 
would  (or  could)  influence  device  effectiveness?  These  two 


conaerns  --  specification  of  criterion  dimensions  and 
specification  of  predictor  variables  --  are  addressed  in 
this  chapter. 

Issue:  What  is  Device  Effectiveness? 

What  do  wo  mean  when  we  claim  a  device  is  "effective?" 
Traditionally,  effectiveness  is  usually  expressed  in  in 
terms  of  transfer  of  training.  We  will  discuss  this  con¬ 
cept  below.  Following  this  discussion,  we  will  present 
other  potential  criteria  of  effectiveness. 

Transfer  of  training:  Definition.  "Transfer"  has 
been  used  to  refer  to  an  empirical  phenomenon,  defined  by 
the  results  from  specific  experimental  paradigms.  For  ex¬ 
ample,  a  simple  transfer  paradigm  is: 

Group  1:  Trains  on  Training  Device  A  --> 

Trains  to  criterion  performance  on  operational  task 

Group  2:  No  training  --> 

Trains  to  criterion  performance  on  operational  task 

To  the  extent  that  Group  1  reaches  operational 
proficiency  faster  than  Group  2,  we  say  that  Group  1  has 
benefited  by  "positive  transfer."  Thus,  transfer  is 
defined  as  the  beneficial  (or  harmful)  effect  of  specific 


previous  learning  on  the  learning  of  a  new  task.  Depending 
on  the  paradigm  and  the  measures  of  performance  used,  we 
can  define  "first-trial"  transfer  (i.e.,  the  beneficial  or 
harmful  effect  of  specific  previous  learning  on  initial 
performance  of  a  task),  "long-term"  transfer  (the  effect  of 
previous  experience  on  the  rate  of  Bkill  acquisition  on  a 


new  task)  ,  and  other  transfer  terms.  The  important  point 
is  that  "transfer"  is  defined  by  the  experimental  paradigm 
and  measure  of  performance  used?  it  is  an  index  of  dif¬ 
ferential  performance  produced  by  specific  experimental 
manipulations.  (For  a  further  discussion  of  transfer  in¬ 
dexes  and  theoretical  underpinnings,  see  Appendix  A) . 

Transfer  has  been  the  principal  criterion  of  training 
device  effectiveness  in  most  previous  attempts  to  develop 
methods  for  predicting  device  effectiveness,  including  all 
of  the  TRAINVICE  series  (Wheaton,  Fingerman,  Rose,  & 
Leonard,  1976a?  Wheaton,  Rose,  Fingerman,  Korotkin,  & 
Holding,  1976b)  Hirshfeld  &  Kochevor,  1979?  Narva,  1979a, 
1979b?  Swezey  &  Evans,  1980?  Faust,  Swezey,  &  Unger,  1980), 
The  rationale  for  transfer  as  the  criterion  is  straightfor¬ 
ward:  Device  1  is  more  effective  than  Device  2  if,  after 
completing  training  on  each  device,  trainees  who  used 
Device  1  perform  better  (i.e.,  initial  transfer)  or  achieve 
proficiency  faster  (i.e.,  rate  of  skill  acquisition)  on  the 
operational  task,  than  trainees  who  used  Device  2. 
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Transfer  of  trainings  Limitations.  There  are  two  im¬ 
portant  criticisms  of  this  "transfer"  rationale  for  device 
evaluation.  First,  some  form  of  operational  performance 
must  be  measured.  This  calls  for  an  elaborate  specifica¬ 
tion  of  "criterion  performance,"  including  such  considera¬ 
tions  as  allowable  individual  variation,  control  for 
measurement  error,  alternative  performance  measures,  etc. 
Obviously,  the  more  complex  the  operational  task,  the  more 
difficult  such  specifications  are  to  elaborate.  For  some¬ 
thing  complcsx  like  "Hit  a  moving  target"  in  tank  gunnery, 
such  elaborations  rapidly  become  arbitrary  (e.g.,  which  of 
myriad  conditions  should  be  tested?  How  reliable  is  the 
weapon?  Is  a  tost  on  a  controlled  range  at  Fort  Knox, 
using  turgets  that  don't  shoot  back,  an  adequate  surrogate 
of  "actual"  combat?  etc.),  However,  for  many  other  tasks, 
the  specifications  are  much  more  straightforward  (e.g., 
convert  grid  to  magnetic  asimuths;  change  the  brake  linings 
on  a  jeep) ,  More  simply,  there  is  a  continuum  of  opera¬ 
tional  task  complexity  that  is  reflected  by  criterion 
measurement  problems.1  Having  chosen  transfer  as  a 
criterion  of  device  effectiveness,  one  must  be  prepared  to 
deal  with  these  measurement  problems.  Adequate  measurement 

1  We  discuss  the  practical  issues  of  criterion  testing  in 
a  luter  section,  where  we  also  indicate  how  one  would 
validate  a  model  that  predicts  transfer. 


of  operational  performance  may  often  be  difficult  or,  in 
extreme  cases,  impossible.  But  this  prospect  should  not 
lead  to  the  rejection  of  transfer  as  a  criterion  of  device 
effectiveness;  if  performance  measurement  is  impossible, 
surrogate  measures  of  transfer  could  still  be  considered. 


The  second  major  criticism  of  the  transfer  rationale 
is  that  it  is  too  restrictive:  it  ignores  the  time,  cost, 
and  effort  associated  with  the  actual  accomplishment  of 
training.2  To  use  an  extreme  example,  suppose  two  devices 
demonstrate  the  same  amount  of  transfer;  however,  train'ees 
on  Device  1  must  spend  ten  times  longer  practicing  on  it 
than  on  Device  2.  Clearly,  these  devices  are  not  equally 
effective  except  in  the  most  general  (transfer)  sense. 


Another  way  of  stating  thiB  criticism  is  to  argue  that 
a  training  device  could  and  should  be  viewed  as  part  of  the 
larger  training  program  in  which  it  is  embedded:  a  device 


2  Traditionally,  the  "goodness"  of  any  training  system  is 
expressed  along  two  dimensions:  cost  and  effectiveness. 

In  addition  to  direct  acquisition  and  production  dollars, 
"cost"  has  several  other  components  that,  in  the  training 
device  situation,  are  convertible  to  dollars.  Device 
facility  requirements,  student  throughput, 
student-to-instructor  ratios,  repair  and  replacement  time, 
device  reliability,  and  other  standard  cost  components  fall 
into  this  category.  While  these  components  can 
(hypothetically)  and  should  be  dealt  with  systematically, 
they  are  not  within  the  scope  of  this  current  effort. 
Nevertheless,  we  do  treat  general  cost  concepts  as  part  of 
an  overall  training  system  evaluation  approach. 


is  effective  if  it  reduces  the  total  time,  cost,  and  effort 
needed  to  bring  soldiers  to  operational  readiness  on  the 
parent  equipment.  This  more  global  view  is  in  contrast  to 
the  narrower  transfer  rationale,  which  views  device  effec¬ 
tiveness  solely  in  terms  of  the  proficiency  levels  observed 
on  the  parent  equipment.  We  will  expand  upon  this  point  in 
a  later  section. 

Transfer:  Conclusion.  From  a  common-sense  perspec¬ 
tive,  the  transfer  rationale  is  unarguable:  unless  use  of  a 
training  device  promotes  some  positive  benefit  for  opera¬ 
tional  performance  (a  savings  in  time  to  reach  criterion 
proficiency,  better  first-trial  performance,  or  whatever), 
it  cannot  be  considered  “effective."  Thus,  positive  trans¬ 
fer,  if  the  appropriate  empirical  evaluation  could  be  con¬ 
ducted,  would  appear  to  be  a  necessary  condition  for  a 
training  device  to  be  judged  effective. 

But,  positive  transfer,  even  when  it  can  be  assessed 
empirically,  surely  is  not  the  only  characteristic  of  an 
effective  training  device;  total  training  time,  cost,  and 
effort  must  also  be  considered. 

Other  effectiveness  criteria.  If  device  evaluators 
(or  purchasers)  were  told  that  two  devices  produced  equal 
transfer  scores  (or  that  it  was  impossible  to  measure 


operational  performance),  what  else  would  they  want  to  know 
about  the  devices?  The  evaluators  might  want  to  know  what 
the  trainee  learns  (or  is  supposed  to  learn)  on  each  train¬ 
ing  device  and  its  relevance  to  the  operational  task.  In 
the  example  above,  perhaps  the  extra  time  associated  with 
Device  1  is  due  to  training  more  knowledge  and  skills  than 
is  possible  with  Device  2  or  even  to  training  irrelevant 
knowledge  and  skills.  The  evaluators  also  might  want  to 
know  if  what  is  taught  is  taught  efficiently.  Similarly, 
they  also  might  inquire  about  the  efficiency  with  which  the 
device  prepares  the  trainee  for  the  operational  task.  Both 
"acquisition  efficiency"  and  "transfer  efficiency"  would 
entail  an  examination  of  the  device's  instructional  fea¬ 
tures.  One  can  think  of  other  kinds  of  information  that 
the  evaluators  also  would  like  to  have.  Each  of  these  ad¬ 
ditional  types  of  information  is  considered  below  as  a 
potential  component  of  a  criterion  measure  of  device 
effectiveness. 

Other  effectiveness  criteria:  Acquisition  of  skills 
and  knowledge.  During  the  training  device  acquisition 
process,  device  evaluators  may  tace  two  types  of  problems: 
first  is  the  case  where  it  is  infeasible  or  impossible  to 
obtain  training  or  transfer  data.  Second  is  the  case  where 
empirical  tranafer-of-training  evaluations  are  conducted 


but:  the  alternative  devices  do  not  differ' on  transfer  index 
values.  In  the  former  case,  evaluators  would  have  to 
develop  a  surrogate  measure  or  an  estimate  of  "potential" 
transfer,  in  the  latter  case  they  would  have  to  develop 
different  measures  or  estimates  of  effectiveness.  In  both 
cases,  the  evaluators  could  expand  their  appraisal  to  look 
at  the  content  of  training:  what  is  taught  and  how  effi¬ 
ciently  it  is  taught. 

The  "what"  of  training,  when  viewed  as  a  surrogate 
measure  of  transfer,  is  typically  measured  as  the  degree  of 
overlap  between  the  content  of  the  training  objective  and 
the  operational  performance  objective.  An  index  based  on 
such  overlap  would  represent  the  amount  of  required 
knowledge  and  skills  the  trainee  has  learned  (or  converse¬ 
ly,  still  must  learn  when  the  trainee  progresses  to  the 
parent  equipment) . 

Concepts  regarding  the  content  and  overlap  of  training 
are  usually  derived  from  the  various  theoretical  views  of 
transfer  phenomena.  (See  Appendix  A  for  further  elabora¬ 
tion  of  these  theoretical  views.)  For  example,  based  on 
Thorndikean  "identical  elements,"  one  could  look  for 
specific  high-fidelity  simulations  or  duplications  of  the 
parent  equipment  and  task(s)  in  the  training  device.  In 


n  1 
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the  extreme,  these  adopting  this  view  might  argue  that  the 
effectiveness  of  training  (and  the  criterion  measure  of 
device  effectiveness)  depends  exclusively  upon  the  number 
or  percentage  of  these  identical  elements.  According  to 
this  view,  if  one  is  to  maximize  effectiveness  one  muBt 
build  the  device  to  simulate  the  parent  equipment  to  the 
maximum  extent  possible)  i.e.,  a  high  fidelity  simulation 
is  required  in  which  the  content  of  training  almost  per¬ 
fectly  overlaps  with  that  of  the  operational  performance 
objective.  And,  of  course,  many  devices  are  designed  and 
developed  with  precisely  this  view  in  mind. 

The  "Osgoodian"  view  considers  stimuli  and  responses 
along  a  continuum  of  similarity.  Thus,  the  relevant  con¬ 
tent.  of  training  would  bo  the  stimuli  and  responses  common 
to  both  situations,  weighted  somehow  by  their  degree  of 
similarity.  An  Osgoodian  also  might  assert  that  a  device 
that  was  identical  in  all  respects  to  the  parent  equipment 
would  be  maximally  effective.  But  he  would  allow  for 
degrees  of  similarity  in  overlapping  content,  and  would  be 
able  to  generate  predictions  of  different  "degrees"  of 
transfer;  further,  based  upon  an  inspection  of  the  content 
of  training  he  would  be  able  to  predict  the  circumstances 
leading  to  negative  transfer. 
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,  However,  neither  of  these  theoretical  perspectives  on 
the  content  of  training  addresses  another  commonly  used 
training  concept  --  namely,  enabling  skills  or  knowledges. 
These  are  "things"  that  are  necessary  for  operational  per¬ 
formance  but  are  not  themselves  directly  a  part  of  the 
criterion  performance.  More  generally,  an  enabling  skill 
or  knowledge,  once  learned,  increases  the  speed  or  ef¬ 
ficiency  of  the  learning  of  some  other  skill.  Gagne 
(1965),  for  example,  writes  about  hierarchies  of  skills  and 
knowledges,  where  lower-order  skills  are  necessary  to  learn 
higher-order  ones,  which  are  necessary  for  still  higher- 
orders,  and  so  on.  In  essence,  one  must  learn  to  walk 
before  one  can  learn  to  run.  There  need  be  no  "identical 
elements"  nor  "stimulus-response  similarities"  at  all  be¬ 
tween  the  lower-order  enabling  skills  acquired  in  the 
training  device  and  the  higher-order  skills  comprising 
operational  task  performance  on  the  parent  equipment. 

Many  devices  and  training  systems  are  designed  and 
developed  to  teach  enabling  skills.  "General  maintenance 
trainers"  are  a  good  example:  they  are  designed  to  teach 
prerequisite  knowledges  and  skills  that  will  enable 
trainees  to  acquire  system-specific  skills  more  easily. 

The  important  point  is  that  the  content  of  training  cannot 
be  delineated  in  terms  of  "identical  elements"  or 


The  most  suitable 


"stimulus -response  similarities, 
vocabulary  to  describe  this  type  of  training  content  is 
that  used  by  cognitive  psychologists  (e.g.,  Neisser,  1976), 
who  talk  of  "knowledge  structures"  and  "schemas."  Training 
consists  of  the  building  of  an  organized  knowledge  struc¬ 
ture  about  a  topic.  This  structure  has  "slots"  where  new 
information  can  be  added  to  it.  Thus,  the  goal  of  training 
is  to  develop  knowledge  structures  in  trainees  that  will 
enable  them  to  incorporate  new  information  --  the  opera¬ 
tional  task  --  easily. 

Regardless  of  one's  perspective  or  vocabulary,  it  is 
clear  that  an  assessment  of  the  content  and  relevance  of 
the  training  device  is,  or  should  be,  part  of  the  charac¬ 
terization  of  a  device's  effectiveness,  Content  specifica¬ 
tion  in  terms  of  the  device-mediated  learning  objective  is 
obviously  critical  to  the  device  designer/developer;  it  is 
also  important  to  the  training  program  evaluator  in  that  it 
could  serve  as  a  surrogate  measure  when  it  is  infeasible  or 
impossible  to  obtain  an  empirical  assessment  of  transfer. 

Other  effectiveness  criteria!  Acquisition  efficiency. 
Suppose  we  have  two  devices,  both  producing  the  sajme 
"amount"  of  transfer  and/or  both  teaching  the  same  content. 
However,  a  trainee  on  one  device  takes  ten  times  as  long  to 


to  acquire  the 


reach  proficiency  on  that  device  (i.e., 
content)  as  it  does  a  trainee  on  the  other  device. 

Clearly,  when  everything  else  is  equal,  we  would  call  the 
device  that  promoted  more  rapid  learning  the  more  "effec¬ 
tive"  one.  The  concept  here  is  "efficiency":  how  well 
(rapidly,  cheaply)  does  the  device  train  the  required 
content? 

Tho  "efficiency"  of  training  typically  is  measured  in 
terms  of  the  rate  of  acquisition  of  the  training  objective. 
The  resulting  index  would  represent  the  time,  coat,  or  ef¬ 
fort  required  to  reach  proficiency  on  the  training  devico. 

Some  aspects  of  the  evaluation  of  efficiency  include 
an  examination  of  the  device's  instructional  featuros  and 
its  pattern  of  use.  For  example,  several  training  experts 
(e.g.,  Braby,  Henry,  Parris,  &  Swope,  1975)  have  developed 
prescriptive  methods  for  the  design  of  training  based  on 
analyses  of  instructional  features.  Typically,  the  form  of 
the  argument  is,  "In  order  to  teach  task  type  X  effective¬ 
ly,  a  device  must  have  feature  Y."  These  arguments  are 
then  combined  to  produce  preliminary  device  specifications. 
Clearly,  it  is  a  relatively  straightforward  matter  to  turn 
this  argument  around  to  generate  evaluative  criteria  for 
assassin,  device  effectiveness.  Thus,  "Device  1  has 


feature  Y;  therefore,  it  will  teach  task  type  X 
effectively.”  If  X  is  what  we  want  to  teach,  Device  1  will 
be  a  more  effective  device  than  Device  2,  which  does  not 
have  feature  Y. 

However,  care  must  be  taken  when  examining  instruc¬ 
tional  features,  in  that  "more"  does  not  necessarily  imply 
"better.”  Devices  with  video  playback  and  freeze-frame 
capabilities  are  not  always  better  than  devices  without 
them  (Swezey,  Criswell,  Huggins,  Hays,  &  Allen,  1983).  The 
effectiveness  of  a  given  feature  will  vary  as  a  function  of 
the  training  content.  Much  of  the  empirical  research  in 
this  area  uses  "task  type"  as  the  descriptive  vocabulary 
for  training  content  (Braby,  et  al.,  1975)  Wheaton,  at  al., 
1976a)  . 

Other  effectiveness  criteria*  Transfer  efficiency. 
Suppose  that  two  devices  train  the  same  content,  and  do  so 
equally  efficiently.  They  will  not  necessarily  produce  the 
same  amount  of  transfer.  This  fact  gives  rise  to  another 
potential  component  of  device  effectiveness  --  namely  the 
efficiency  with  which  the  trainee  is  prepared  for  acquiring 
the  skills  and  knowledges  that  still  must  be  learned  on  the 
parent  equipment.  Instructional  features  can  be 
Incorporated  in  a  device  that  enhance  the  rate  of 


acquisition  of  knowledge  and  skills  on  the  patent  equipment 
independently  of  enhancing  the  rate  of  acquisition  of  the 
device-mediated  training  objective. 

A  further  fairly  subtle  point  is  that  features  that 
enhance  transfer  may  not  necessarily  enhance  acquisition. 
Suppose  a  training  device  had  a  feature  that  allowed  for 
simulation  of  environmental  conditions  found  in  the  opera¬ 
tional  situation  --  noise,  heat,  darkness,  etc.  This  fea¬ 
ture  would  undoubtedly  enhance  transfer  to  these  situa¬ 
tions.  However,  its  use  would  surely  slow  down  the  rate  of 
skill  acquisition  or  learning  within  the  device. 

Thus,  transfer  efficiency  seems  to  be  another  distinct 
component  of  device  effectiveness,  in  addition  to  those 
previously  discussed:  transfer,  the  content  of  training, 
and  the  efficiency  of  training.  Are  there  other  concepts 
that  have  been  used  or  suggested  as  device  effectiveness 
measures? 

Other  effectiveness  concepts.  Most  other  concepts 
that  have  been  considered  as  potential  measures  of  device 
effectiveness  fall  into  the  category  of  "user  acceptance" 
(Mackie,  Kelly,  Moe,  &  Mecher  ikof f , ,  1972).  This  usually 
has  two  parts:  instructor  acceptance  and  trainee 
acoeptence.  A  device  presumably  will  not  be  effective  if 
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instructors  and  trainees  won't  or  can't  use  it.  Such  might 
be  the  case,  for  example,  if  there  were  a  significant 
burden  added  to  instructors'  workloads  by  requiring  them  to 
learn  to  operate  a  complicated  device,  if  trainees  had  to 
learn  excessive  "extra-job"  skills  just  to  operate  a 
device,  or  if  either  group  felt  the  device  was  providing 
irrelevant  training. 

These  are  important  considerations,  certainly.  A 
device  should  not  be  built  or  purchased  that  is  too  dif¬ 
ficult  or  awkward  for  instructors  and  trainees  to  use. 
Presumably,  indexes  of  instructor  and  trainee  workloads 
could  be  incorporated  in  an  assessment  of  device  effective¬ 
ness.  "Extra-job"  skills  could  be  incorporated  as  part  of 
an  index  of  the  content  and  relevance  of  training.  On  the 
other  hand,  beyond  emphasising  sound  human-engineering 
practices  (e.g.,  Smode,  1972),  there  is  little  that  can  be 
done  by  the  device  designer  to  increase  the  probability 
that  the  device  will  be  considered  relevant  to  instructors 
and  trainees.  Some  might  argue  that  acceptance  will  in¬ 
crease  if  the  device  can  be  made  more  realistic  --  in  other 
words,  to  make  it  simpler  to  relate  the  training  to  actual 
job  performance.  However,  increased  realism  might  or  might 
not  lead  to  more  effective  training,  especially  given  the 
arguments  made  above  concerning  enabling  skills.  The  real 


t 3bug  is  how  best  to  convince  instructors  and  trainees  that 
the  training  system  will  lead  to  better  job  performance. 

In  our  opinion,  the  fc-est  woy  to  do  this  is  by  providing 
them  with  empirical  evidence  of  successful  training. 

Summary:  Device  effectiveness.  The  first  step  in 
developing  an  analytic  procedure  for  predicting  the  poten¬ 
tial  effectiveness  of  training  devices  is  to  pin  down  just 
what  v«  mean  by  the  term  "device  effectiveness."  In  the 
preceding  section  we  have  examined  several  different  and 
general  conceptions  of  effectiveness:  1)  an  effective 
device  promotes  transfer  of  training  to  the  parent  equip¬ 
ment)  2)  an  effective  device  enables  trainees  to  acquire 
necessary  skills  and  knowledge  rapidly;  3)  an  effective 
device  is  accepted  by  the  trainees  a nr  instructors  who  in¬ 
teract  with  it. 

The  criterion  most  often  used  to  characterise  training 
device  ef fecti voness  is  transfer  of  training,  baaed  on  an 
estimate  of  trainee  proficiency  on  the  parent  equipment 
relative  to  the  proficiency  of  soma  type  of  control  group 
on  that  same  equipment.  As  we  Indicated  earlier,  when  the 
estimate  is  based  on  an  empirical  investigation,  transfer 
can  be  expressed  in  neveral  different  ways  depending  upon 
the  specific  experimental  paradigm  employed.  For  example, 
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relative  to  the  performance  of  a  particular  type  of  control 
group,  device  effectiveness  can  be  stated  in  terms  of  the 
level  of  trainee  proficiency  on  the  parent  equipment  after 
a  specified  amount  of  time  (or  trials)  and/or  as  the  amount 
of  time  (trials)  required  to  reach  a  specified  level  of 
proficiency. 

A  second  component  or  criterion  of  device  effective¬ 
ness  is  the  skills  and  knowledge  acquired  during  training, 
expressed  as  an  estimate  of  trainee  proficiency  on  the 
training  device  per  se.  When  based  upon  an  empirical  as¬ 
sessment,  this  estimate  also  can  be  expressed  in  different 
ways,  For  example,  effectiveness  can  be  characterized  in 
terms  of  the  level  of  trainee  proficiency  on  the  device  af¬ 
ter  a  fixed  amount  of  practice  (time,  trials)  or  sb  the 
amount  of  practice  required  to  attain  a  specified  level  of 
proficiency.  In  this  connection,  we  noted  that  aspects  of 
training  external  to  and  apart  from  the  device  (e.g.,  cour¬ 
ses  and  lessons,  classroom  exercises,  other  training 
devices,  etc.)  may  nevertheless  contribute  to  proficiency 
on  the  device. 

A  third  component  of  device  effectiveness  is  user  ac¬ 
ceptance.  This  concept  is  typically  operationalised  in 
terms  of  trainee  and  instructor  ratings.  The  ratings  are 


obtained  on  such  training  device  dimensions  as  fidelity  or 
realism,  convenience  of  use,  and  the  perceived  value  of 
training. 

Although  we  have  treated  these  notions  of  device  ef¬ 
fectiveness  separately,  we  do  not  mecn  to  imply  that  they 
are  necessarily  independent,  alternative,  or  competitive 
criteria,  Rather,  we  view  them  as  useful  and  complementary 
components  of  an  effectiveness  criterion  that  is  inherently 
multidimensional.  To  support  the  evaluation  of  a  training 
device  we  would  like  empirical  assessments  of  each  com¬ 
ponent,  whenever  possible.  While  it  may  be  highly  desir¬ 
able  to  determine  how  much  transfer  is  associated  with  a 
given  device,  such  a  determination  may  not  be  feasible)  or 
if  feasible  may  be  inconclusive)  or  when  conclusive,  may 
not  tell  the  whole  story.  For  these  reasons,  the  empirical 
evaluation  of  a  training  device  should  encompass  considera¬ 
tion  of  other  components  as  well.  Similarly,  procedures 
for  forecasting  device  effectiveness,  which  heretofore  have 
focused  entirely  on  transfer  of  training,  also  need  to 
adopt  this  broader  perspective. 

This  brings  us  to  one  of  the  most  fundamental  issues 
in  this  p*per,  How  are  we  to  proceed  with  the  evaluation 
of  a  training  device  when  the  various  components  of  device 
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effectiveness  can  not  be  assessed  empirically*  the 
situation  typically  confrontiny  the  designers  and 
developers  of  major  training  devices?  The  answer  lies  in 
identifying  surrogates  for  the  components  of  device  effec¬ 
tiveness  discussed  above*  and  then  using  analytic 
procedures  to  generate  estimates  of  the  various  surrogates 
For  example*  it  might  be  possible  to  use  amount  of  overlap 
in  the  content  of  training  and  operational  (i.e.*  parent 
equipment)  performance  objectives  as  an  estimate  of  poten¬ 
tial  transfer  of  training.  Similarly*  analyses  of  the  con 
tent  of  training  and  performance  objectives*  coupled  with 
an  appraisal  of  instructional  features*  might  provide  es¬ 
timates  of  acquisition  or  transfer  efficiency.  One  objec¬ 
tive  of  the  present  project  is  to  Identify  such  surrogates 
and  to  develop  procedures  for  their  assessment. 

Issue:  Whet  ere  the  Variables  Influencing  Device 
Iffectiveness? 

During  the  design*  development  and  evaluation  of 
training  devices  we  need  to  consider  the  independent  vari¬ 
ables  hypothetically  influencing  device  effectiveness  for 
two  important  reasons.  First*  when  we  are  able  to  carry 
out  an  empirical  evaluation  of  a  training  device*  we  will 
wind  up  with  a  multidimensional  assessment  that  is  almost 


entirely  outcome  oriented.  That  is,  we  will  describe  the 
device  in  terms  of  a  certain  amount  of  transfer,  a 
particular  rate  of  skill  and  knowledge  acquisition,  etc. 

If  at  all  possible,  it  would  be  desirable  to  augment  such 
an  appraisal  with  more  diagnostic  information  that  suggests 
how  particular  independent  variables  contribute  to  measured 
effectiveness.  Armed  with  such  knowledge,  it  would  then  be 
possible  to  entertain  "what  if"  questions,  contemplating  in 
at  least  a  rough  fashion  how  device  effectiveness  might 
vary  were  changes  in  selected  independent  variables  intro¬ 
duced.  In  this  application,  information  about  the 
relationships  between  independent  variables  and  effective¬ 
ness  criteria  would  be  used  to  prescribe  design  modifica¬ 
tions  intended  to  enhance  device  effectiveness. 

The  second  reason  that  independent  variables 
hypothetically  influencing  device  effectiveness  are  of  in¬ 
terest  is  because  an  empirical  evaluation  of  effectiveness 
often  may  not  be  feasible.  In  this  case  we  would  want  to 
conduct  an  analytic  appraisal  and  would  need  a  set  of 
predictor  variables  in  terms  of  which  to  couch  our  effec¬ 
tiveness  forecasts  or  estimates.  That  is,  given  informa¬ 
tion  about  selected  independent  variables,  we  would  attempt 
to  predict  training  device  effectiveness  on  a  variety  of 
surrogate  criterion  measures.  There  also,  of  course,  is 


diagnostic  value  in  auch  an  appraisal.  In  principle,  we 
could  explore  the  manipulation  of  specific  independent 
variables,  estimating  their  influence  on  effectiveness,  and 
use  the  results  of  various  changes  to  inform  us  about  the 
probable  value  of  different  design  modifications. 

Given  a  multidimensional  criterion  of  device  effec¬ 
tiveness  that  includes  facets  of  both  initial  learning  and 
subsequent  transfer,  we  can  think  of  many  variables  that 
potentially  may  influence  device  effectiveness,  and  there¬ 
fore  should  be  considered  for  diagnostic  and  forecasting 
purposes.  Reviews  of  the  literature  and  analyses  of  train¬ 
ing  phenomena  (e.g.,  Miller,  1934 ;  Valverde,  1968;  Blaiwes, 
&  Regan,  1970 i  Blaiwes,  Puig,  fc  Regan,  1973;  Aagard  a 
Braby,  1976;  Wheaton,  Rose,  Fingerman,  Korotkin,  &  Holding, 
1976b;  Royer,  1979;  Hays,  1980;  Rose,  1980;  Rose,  Allen,  & 
Johnson,  1982;  Rose,  MoLaughlin,  6  Felker,  1981)  point 
toward  a  myriad  of  relevant  variables  for  which  there  is 
empirical  or  theoretical  support. 

Baaed  upon  a  review  of  the  literature,  an  examination 
of  available  effectiveness  forecasting  models,  and  a  multi¬ 
dimensional  conception  of  training  device  effectiveness, 
there  appear  to  be  five  categories  of  independent  prediator 
variables  that  warrant  consideration.  That  is,  these 


categories  appear  salient.  If  we  were  to  manipulate 
variables  within  any  of  these  categories  we  would  expect  to 
observe  certain  specifiable  changes  in  particular  com¬ 
ponents  of  the  device  effectiveness  criterion.  We  discuss 
oach  category  briefly. 

Trainee  quality.  As  the  primary  input  to  the  training 
process#  we  are  concerned  about  a  variety  of  trainee  vari¬ 
ables.  These  include  such  concepts  as  trainee  intel¬ 
ligence/  aptitude  or  ability#  motivation  to  learn#  and 
prior  experience#  as  reflected  in  entry  levels  of  skill  and 
knowledge  and  initial  levels  of  proficiency  on  the  training 
device  or  the  parent  equipment.  Collectively#  such  vari¬ 
ables  represent  the  quality  of  incoming  trainees  and  are 
usually  manipulated  as  part  of  some  earlier  personnel 
selection  or  classification  procedure.  It  is  hypothesized 
that  higher  quality  will  be  reflected  in  faster  rates  of 
skill  acquisition  and  greater  or  more  rapid  transfer. 

In  many  contexts#  personnel  variables  of  this  type  are 
treated  as  within-group  individual  differences#  with  a 
focus  on  each  individual.  Traditionally,  however#  training 
device  designers  and  evaluators  hr.ve  addressed  quality  of 
personnel  essentially  as  a  between-group  variable.  That 
is,  device  developers  have  predicated  certain  design 


dtipisionc  on  the  characteristics  of  the  typical,  average, 
or  modal  trainee  who.  will  proceed  through  training.  Device 
evaluators  have  attempted  to  match  experimental  (trained) 
and  control  (untrained)  groups  on  the  basis  of  trainee 
quality  during  empirical  assessments  of  transfer  of 
training. 

■  .  ■  ;  i . 

Preliminary  training.  Variable*  within  this  category 

’  ! 

reflect  the  type  and  amount  of  enabling  or  prerequisite  in¬ 
struction  and  training  that  trainees  receive  prior  to  their 

,v 

exposure  to  the  training  device,  Indoctrination  and  orien¬ 
tation  sessions,  procedural  training,  demonstrations,  lec¬ 
tures  and  reading  assignments,  etc.,  that  enhance  the 
quality  of  trainees  and  better  prepare  them  for  device- 
mediated  training  fall  within  this  category.  It  is 
hypothesised  that  the  provision  of  enabling  skills  and 
knowledge',  proficiency  in  part-task  performance,  etc.,  will 
be  associated  with  more  rapid  acquisition  of  training 
device-mediated  objectives  and  better  (greater,  faster) 
transfer. 

Task  type.  Tho  types  of  tasks  comprising  a  device- 
mediated  training  objective  or  tho  operational  performance 
objective  associated  with  the  parent  equipment  are 
important  considerations.  The  type  of  task  includes  such 


variables  as  the  number  of  task  steps,  sequential 
dependencies  among  steps,  task  aiding,  cognitive  and 
psychomotor  demands,  etc.  Systematic  manipulation  of  these 
types  of  variables  is  known  to  influence  acquisition  and 
retention  of  skilled  performance  and  should  influence  ac¬ 
quisition  and  transfer  components  of  device  effectiveness. 

Device  type.  This  category  includes  variables  that 
represent  engineering  and  instructional  featui.es  of  a 
training  device.  These  features  are  the  ones  that  typical¬ 
ly  come  to  mind  when  designers  and  evaluators  ponder  about 
characteristics  that  may  enhance  or  degrade  training  device 
effectiveness. 

The  subset  of  so-called  engineering  variables  reflects 
such  concepts  as  the  fidelity  of  simulation  or  similarity 
between  the  training  device  and  the  parent  equipment  it 
presumably  represents.  In  spite  of  a  voluminous  literature 
on  concepts  like  engineering,  environmental,  or  psychologi¬ 
cal  fidelity,  or  physical  and  functional  similarity,  their 
Influence  on  components  of  device  effectiveness  is  not 
clearly  understood.  Very  generally  speaking,  increases  in 
similarity  between  the  device  and  parent  equipment 
facilitate  transfer  of  training.  However,  very  high 
similarity  or  fidelity  does  not  insure  better  transfer; 


32 


•\ •**  «*.,  **,.  *V 

■  *  »  j  %  ***’•' 


.1 


■  •  -  V  '»  *i  V  *v  *,  /S  ’ 


transfer  of  training  can  occur  whan  fidelity,  at  least  as 
conventionally  measured#  is  quite  low;  and  there  are 
conditions  of  stimulus  and  responae  similarity  that  can 
lead  to  at  least  initial  if  not  prolonged  negative  transfer 
of  training. 

The  subset  of  instructional  features  includes  vari¬ 
ables  that  are  intended  both  to  facilitate  acquisition  of 
skill  in  the  training  device  and  to  promote  transfer  of 
training  to  the  parent  equipment.  These  variables  include 
sequencing  of  stimulus  or  problem  difficulty#  provision  of 
feedback  to  both  trainees  and  instructors#  manipulation  of 
slgnal-to-noise  ratios#  measurement  and  recording  of 
trainee  performance#  adaptation  of  type  and  level  of  in¬ 
struction  to  level  of  proficiency#  etc. 

Training  contest.  This  category  subsumes  a  variety  of 
ancillary  but  potentially  important  variables  that  do  not 
fit  neatly  into  any  of  the  prior  categories.  The  variables 
are  descriptive  in  one  way  or  another  of  the  larger  train¬ 
ing  program  or  contest  within  which  a  training  device  is 
utilized.  For  example#  contextual  variables  include  the 
scheduling  of  training  (e.g.#  the  type,  amount  and  dis¬ 
tribution  of  practice)  as  well  as  the  performance  criteria 
that  signal  a  cessation  of  training  on  the  device  and 
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adequate  proficiency  on  the  parent  equipment  (e.g., 
first-trial  or  longer-term  transfer).  They  also  include 
instructor  proficiency  as  well  as  user  acceptance  of  the 
device*  1 

All  of  the  variables  subsumed  under  these  categories 
are  familiar*  The  issue  is,  which  ones  of  this  large  array 
need  to  be  consid«rod,  particularly  in  the  course  of 
developing  a  procedure  to  forecast  training  device  effec¬ 
tiveness?  in  general,  existing  methods  have  focused  almost 
exclusively  on  training  device  parameters,  choosing  largely 
to  ignore  extra-device,  training  program  variables.  Two 
rationales  have  been  advanced  for  this  restricted  focus. 

The  first  is  that  forecasting  procedures  do  not  want  to 
"penalise"  a  device  —  e.g.,  with  a  lower  effectiveness 
score  —  simply  because  it  might  be  used  inappropriately, 
introduced  without  prerequisite  instruction  if  required,  or 
staffed  and  operated  by  poorly  trained  instructors,  etc. 

The  second  and  more  pragmatic  reason  is  that  information 
about  the  training  program  or  device  utilization  is  seldom 
supplied  along  with  a  detailed  description  of  the  training 

1  User  acceptance,  as  our  earlier  discussion  suggests,  can 
be  viewed  as  a  criterion  of  device  effectiveness.  Our 
preference,  however,  is  to  treat  it  as  an  intervening 
variable.  User  acceptance,  therefore,  can  exert  an 
influence  on  the  primary  acquisition  and  transfer 
components  of  device  effectiveness. 
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device.  At  beet#  therefore#  present  forecasting  methods 
"reward"  a  device  that  allows  for  flexibility  of 
utilization#  but  do  not  provide  for  evaluation  of  the 
device  in  terms  of  a  specific  utilization  plan  or  training 
program  context.  Below#  we  describe  a  general  program 
evaluation  framework  that  can  be  used  to  organize  the  ef¬ 
fectiveness  criterion  and  predictor  variables  discussed  so 
far. 

Theoretical  Issues!  Conclusion,  k  Device  Effectiveness 
Evaluation  rranewock 

Throughout  the  discussion  of  criterion  and  predictor 
variables  of  device  effectiveness  we  have  found  it  useful 
to  broaden  our  perspective  on  device  evaluation:  to  con¬ 
sider  criteria  of  effectiveness  in  addition  to  transfer  of 
training;  to  examine  predictor  variables  lying  beyond  the 
domains  of  task  and  device  characteristics  that  tradition¬ 
ally  have  been  examined  during  empirical  and  analytic  as¬ 
sessments  of  effectiveness.  We  believe  that  a  training 
device#  no  matter  how  simple  (e.g.#  a  part-task  trainer)  or 
sophisticated  (e.g.#  a  full-scale  weapon  system  simulator) 
is  but  one  component  of  a  larger  training  program.  It  is 
possible  to  compare  training  devices  or  even  alternative 
training  concepts  that  are  in  some  sense  interchangeable 
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within  a  given  training  program,  but  it  does  not  make  much 
sense  to  compare  or  evaluate  them  in  the  absence  of  such  a 
broader  context. 

Given  this  larger  perspective,  it  follows  that  a 
training  device  oOn  not  be  meaningfully  evaluated  without 
considering  its  intended  role  in  the  overall  program,  in¬ 
cluding  the  plan  for  its  use.  Thus,  what  needs  to  be 
evaluated  or  compared  is  not  the  training  device (s),  but 
the  entire  training  proqram(a) .  This  Includes  the 
specification  of  craining  materials  (documentation, 
devices,  and  Instructors),  the  sequence  of  training  or  the 
program  of  instruction,  the  level  of  instructor  training 
required  and  provided,  the  amount  of  instructor  and  student 
time  involved,  and  the  criteria  for  successful  completion 
of  the  training  program  and  operational  proficiency  on  the 
parent  equipment. 

How  does  one  evaluate  an  entire  training  program?  In 
other  words,  given  certain  inputs  (knowledges,  skills, 
abilities,  and  other  characteristics  of  the  trainee  popula¬ 
tion)  and  certain  desired  outputs  (proficiency  requirements 
of  the  operational  situation),  how  do  we  evaluate  the 
program  that  is  designed  to  operate  on  the  input  to  achieve 
the  desired  outcome? 


Ultimately,  we  can  express  program  effectiveness  in 
terms  of  the  extent  to  which  terminal  program  objectives 


are  met.  Those  objectives  are  to  get  trainees  to  criterion 
levels  of  operational  proficiency  as  quickly,  cheaply,  and 
safely  as  possible.  However,  it  often  is  infeasible  ov  im¬ 
possible  to  determine  whether  terminal  program  objectives 
have  been  met.  Moreover,  by  focusing  exclusively  on  ter¬ 
minal  outcomes,  one  may  neglect  several  other  important 
evaluative  criteria  of  the  types  discussed  earlier  that 
provide  valuable  diagnostic  information  --  why  the  program 
was  effective  or  not  affective. 

Evaluation  issues  of  these  types  have  abounded  in  many 
other  contexts,  most  notably  during  attempts  to  evaluate 
the  impact  of  major  social  programs  (e.g.,  Cronin  & 

Bourque,  1981;  Cronin,  Drury,  ft  Gragg,  1983).  Although 
these  programs  (e.g.,  criminal  justice,  education,  poverty, 
health  care  delivery,  etc.)  and  the  specific  indexes  of 
program  impact  developed  for  them  have  no  bearing  on  Drain¬ 
ing  device  evaluation,  the  basic  model  of  impact  assessment 
that  has  been  employed  is  directly  relevant!  frequently,  it 
was  infeasible  or  impossible  to  measure  terminal  program 
objectives  directly)  diagnostic  information  was  critical  to 
the  evaluation)  there  were  many  "extraneous"  (to  the 
program)  variables  that  affected  the  outcomes. 


37 


As  shown  in  Figure  1,  the  model  if,  based  on  a  program 
rational?,  or  network  of  hypotheses,  v/hich  makes  explicit 
the  dynamics  of  the  cause-effect  relationships  being 
investigated. 
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Figure  1.  General  model  of  the  program  rationale. 

The  methodological  focus  in  this  model  is  on  the  hypotheses 
that  relate  events  at  one  stage  to  those  at  the  next.  The 
oertainty  with  which  outcomes  can  be  attributed  to  Inputs 
under  program  control  is  vastly  enhanced  by  this  technique. 
Am  important  consequence  of  this  feature  la  that  the  as¬ 
sessment  dose  not  treat  an  intervention  program  as  an  en¬ 
tity  that  succeeds  or  fails  in  accordance  with  the  average 
impact  yielded  by  the  type  of  approach  which  characterizes 
the  program.  The  aim  is  to  identify  the  individual  com¬ 
ponents  that  should  be  modified  or  attended  to  when  further 
Implementation  or  evaluation  is  planned. 
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This  general  type  of  program  evaluation  model  seems 
perfectly  suited  to  the  assessment  of  training  devices.  It 
suggests  that  we  examine  the  training  program  rationale! 
the  specific  cause  and  effect  linkages  that  explain  why  and 
how  certain  inputs  (planned  and  unplanned)  lead  to  certain 
outcomes.  Development  and  analysis  of  the  rationale 
require  description  of  many  aspeots  of  the  training 
program,  including!  the  input  and  ultimate  output,  all  of 
the  intermediate  outcomes,  the  linkage  between  intermediate 
outcomes,  the  variables  potentially  influencing  each  inter¬ 
mediate  outcome,  and  the  relationships  between  the  inter¬ 
mediate  outcomes  and  ultimate  program  output. 

An  example  of  a  rationale  that  links  Independent 
predictor  variables  to  various  components  of  training 
device  effectiveness  might  look  something  like  the 
following! 

1.  Program  inputs  are  the  learning-relevant  charac¬ 
teristics  of  the  trainees.  These  may  be  knowledges, 
skills,  abilities  and  other  characteristics  including 
trainee  motivation  to  learn.  We  have  already  mentioned 
such  variables  under  the  general  rubric  of  trainee  quality, 
a  class  of  variables  that  can  be  manipulated  to  influence 
estimates  of  device  effectiveness. 


2.  Program  activity  X  la  the  preliminary  training  and 
instruction  that  trainees  receive  as  pa r >.  or'  the  overall 
training  program#  prior  to  their  practicing  on  the  training 
device.  Training  programs  obviously  "an  differ  widely  in 
the  amount  and  type  of  such  support. 

3.  Program  activity  XX  is  the  training  mediated  by 
the  training  device  per  se.  Its  description  would  include 
the  specific  training  objective (s) ,  the  typos  of  tasks  con¬ 
tained  in  the  device-mediated  training  objective#  and  the 
instructional  features  with  which  the  device  is  equipped. 
Physical  and  functional  similarity  as  well  as  various  types 
of  fidelity  would  also  be  included  as  part  of  the  training 
device  description. 

4.  Training  context  X  includes  everything  that  poten¬ 
tially  might  affect  the  trainee-device  interaction  abova 
and  beyond  the  program  elements  already  described.  The 
context  could  include  instructor  proficiency#  user  accep¬ 
tance,  device  reliability  and  maintainability#  practice 
schedule*#  integrity  (with  respect  to  some  plan)  of  device 
implementation#  and  interactions  among  these  and  other 
variables. 


The  training  device  evaluation  model  so  far  is: 


S.  Intermediate  outcome  X  is  trainee  performance  on 
the  training  device.  This  first  component  of  device  effec¬ 
tiveness  can  be  expressed  in  terms  of  both  time  and  ac¬ 
curacy  measures  of  performance  and  in  terms  of  "process" 
information  (e.g.,  time,  trials,  acquisition  rate,  etc.). 
The  focus  is  on  the  skills  and  knowledge  that  are  imparted 
through  device-mediated  training  as  well  as  on  the  ef¬ 
ficiency  with  which  the  training  objective  is  accomplished. 
If  trainee  proficiency  on  the  device  does  not  reach  expec¬ 
ted  levels,  then  we  would  perform  diagnostic  analyses  to 
seek  the  reasons  for  such  a  shortcoming.  Toward  that  end 
we  would  examine  the  trainee  input,  the  supplemental  in¬ 
struction,  characteristics  of  the  training  device,  and 
facets  of  the  larger  program  context. 


*•  «*•  ***  i  «V 


6.  Program  activity  XXI  is  whatever  trainees  might  do 
next,  such  as  receiving  additional  training  of  some  sort  or 
being  tested  on  the  parent  equipment,  in  the  latter  case, 
we  would  describe  the  parent  equipment  in  terms  of  the 
tasks  comprising  the  operational  performance  objectives (s) 
and  its  overall  similarity  to  the  training  device. 

7.  Training  context  XX  includes  many  of  the  same 
variables  considered  under  the  Training  Context  X  rubric. 

We  are  interested  in  any  variables  influencing  the 
trainee's  interaction  with  the  parent  equipment  including, 
for  example,  instructional  features  of  the  training  device 
that  are  intended  to  facilitate  the  interaction,  the  condi¬ 
tions  of  performance,  the  amount  of  time  that  has  elapsed 
since  cessation  of  device-mediated  training,  etc. 

8.  Intermediate  outcome  XX  is  trainee  performance  on 
the  parent  equipment.  This  may  Include  measures  of  initial 
and  later  performance  as  well  as  several  types  of  process 
information,  all  of  which  may  be  cast  into  transfer  of 
training  indexes. 

9.  Longer-term  outcomes  represent  the  extended  ef¬ 
fects  of  the  training  program.  These  would  include,  for 
example,  performance  on  the  parent  equipment  under  wartime 
conditions,  presumably  the  ultimate  criterion  of  device 
effectiveness. 
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Th«  complete  program  avaluation  rationale  would  be* 


We  are  auggeating  that  thia  general  program  evaluation 
framework  can  be  uaed  to  aaaeaa  training  device  effective¬ 
ness  in  terms  of  the  four  criterion  constructs  discussed 
earlier.  There  is  an  acquisition  construct  representing 
what  is  learned  on  the  training  device  and  an  acquisition 
efficiency  construct,  representing  how  well  (how  quickly, 
cheaply,  etc.)  the  device  trains  what  it  is  supposed  to 
Acquisition  of  knowledge  and  skill  related  to  the 
training  objective (a)  is  measured  directly  by  Intermediate 
Outcome  I,  which  also  provides  for  assessment  of 
acquisition  efficiency  in  terms  of  whatever  process  indexes 
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are  deemed  appropriate.  At  this  stage  in  the  evaluation, 
specific  skill  acquisition  outcomes  are  interpreted  in  ,  t 

light  of  information  about  trainee,  preliminary  training, 
training  device  and  contextual  variables. 

There  also  is  a  transfer  construct  of  device  effec¬ 
tiveness,  indicating  what  the  trainee  will  still  have  to  j 

learn  after  "graduating"  from  the  training  device  and  a  j 

transfer  efficiency  construct  reflecting  how  well  the  j 

j 

device  prepares  the  trainee  for  the  operational  task(s).  1 

■  "  -  j 

Both  constructs  are  measured  at  Intermediate  Outcome  II  by 
whatever  transfer  index  is  judged  suitable  (e.g.,  initial 
transfer,  savings,  etc.).  At  this  later  stage  in  device 
evaluation,  specific  transfer  of  training  outcomes  are  in¬ 
terpreted  in  light  of  information  about  the  degree  of  over¬ 
lap  between  training  and  operational  performance  objec¬ 
tives,  trainee  proficiency  on  the  training  device,  charac¬ 
teristics  of  the  device  and  contextual  variables. 

In  essence,  the  independent  and  criterion  variables 
that  we  have  described,  when  considered  within  a  program 
evaluation  framework,  define  a  model  of  training  device  ef¬ 
fectiveness.  A  particular  training  program  describes  a 
path  between  initial  inputs,  program  activities  and 
intermediate  outcomes.  The  distance  to  the  first 
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intermediate  outcome  can  be  expressed  in  terms  of  a 
••deficit"  --  how  much  the  trainee  must  learn  in  otder  to 

.  ,  »  .  «  .  .  .  «  i  :  ’ 

attain  criterion  proficiency  on  the  device,  how  long. it 

will  take  him  to  reach  that  criterion,  and  how  much  it  will 

1  ..  .  .  ■■  v'V  r 

cost.  The  distance  between  the  first  intermediate  outcome 

V.  ’  ’■  ;  S  Y  . 

(i.e.f  the  acquisition  of,  skill  and  knowledge  on  the 
device)  and  the  second  intermediate  outcome  (i.e.,  the 
level  of  proficiency  required  on  the  parent  equipment)  also 

,  ‘f  V'  1 

can  be  expressed  as  a  deficit  --  how  much  the  graduate 
trainee  still  has  to  learn,  how  long  it  will  take,  etc. 
Different  training  devirsa  have  different  distances  or 
deficits;  the  four  suggested  criterion  constructs  of  effect 
tiveness  address  the  magnitude  of  theBe  distances;  the  five 
different  classes  of  independent  variables  address  how 
rapidly  they  will  be  traversed. 

The  concept  of  a  deficit  model  of  training  device  ef¬ 
fectiveness  is  depicted  in  more  detail  in  Figure  2  on  the 
next  page.  Figure  2  is  a  stylized  representation  of 
various  aspects  of  training  devices,  the  operational  task, 
and  the  relationships  among  the  several  components. 

Point  A  represents  the  initial  skills  and  knowledge  pos¬ 
sessed  by  the  trainee  prior  to  exposure  to  the  training 
device  or  the  operational  equipment,  and  the  expected  level 
of  trainee  performance  on  the  operational  task  prior  to 


Figure  2.  Deflolt  model  of  training  davloa  effe  ctlveneaa. 
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■  Initial  skllla  and  knowledge  of  TRAINEE:  performance  on  operational  talk  prior  to 
training  on  device  (TD) 

m  skill*  and  knowiadga  of  TRAINEE  at  correlation  of  TD.  raglman;  criterion  parformanoa 
on  TD, 

*  skills  and  knowiadga  of  TRAINEE  at  completion  of  TD-  raglman:  criterion  performance 
on  TDa 

■  skills  and  knowiadga  needed  to  perform  operational  task;  criterion  parformanoa  on; 
operational  equipment 

*  skills  and  knowledge, needed  to  perform  operational  task  possessed  by  trainee  after  TD 
exposure:  performance  on  operational  equipment 


AD  m  time,  oost  associated  with  learning  D  on  operational  equipment 

AB,  AC  *»  time,  cost  associated  with  learning  B,  C  on  TDa 

BD,  CD  *  time,  cost  associated  with  learning  D  given  learning  on  TDs 

ABD,  ACO  ■  total  time,  coat  associated  with  learning  D  for  each  TD 
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training.  Point  D  represents  the  skills  and  knowledge  of 
performance  on  the  operational  task,  and  the  criterion 
level  needed  to  perform  the  operational  task  (using  the  ac¬ 
tual  equipment).  Thus,  the  AD  "vector”  represents  a  per¬ 
formance  deficit  and  the  learning  that  must  occur  if  the 
trainee  is  to  learn  to  perform  the  operational  task.  In 
addition  to  representing  the  learning  that  must  take  place, 
this  vector  altio  represents  the  time,  cost,  and  resources 
necessary  to  train  the  operational  task  using  only  the 
operational  equipment. 

Point  B  represents  the  skills  and  knowledge  possessed 
by  the  trainee  at  the  completion  of  training  using  a  train¬ 
ing  device.  It  also  repress  .ts  the  criterion  performance 
level  on  the  training  devico,  along  with  the  associated 
time,  cost,  and  resources)  the  vector  BD  represents  the 
learning  (and  associated  time,  cost,  and  resources)  that  is 
necessary  to  acquire  the  appropriate  operational  skills  and 
knowledge  following  training  on  the  device.  The  vector  ABD 
is  then  the  total  time,  cost,  and  resources  associated  with 
learning  D  using  the  training  device.  Point  C  and  its  as¬ 
sociated  vectors  represent  a  second  training  device.  (This 
point  is  included  in  Figure  2  to  allow  for  situations  where 
alternative  training  devices  are  to  be  compared  to  each 
othor.)  The  points  B'  and  C'  represent  the  skills  and 
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knowledge  needed  to  perform  the  operational  task  that  are 
poaseaaed  by  the  trainee  after  exposure  to  the  respective 
training  devices.  Hence,  B*  and  C  equate  to  the  trainee's 
level  of  performance  on  the  operational  task  after  comple¬ 
tion  of  the  training  device  regimen  and  prior  to  any  fur¬ 
ther  practice  or  training  on  the  parent  equipment. 

The  basic  rationale  for  the  use  of  a  training  device 
in  terms  of  Figure  1  is  that  the  ABD  vector  will  be  "short¬ 
er”  than  the  AD  vector.  That  is,  the  total  training 
cost/time  will  be  less  when  a  training  device  is  used  than 
when  the  operational  equipment  itself  is  used  as  a  trainer. 

The  ideal  training  device  evaluation,  especially  when 
alternative  devices  or  concepts  are  to  be  compared,  is  to 
measure  or  estimate  ABD  and  ACDt  the  total  time  and  cost 
associated  with  learning  D  for  each  training  device,  con¬ 
trasted  according  to  whatever  rule  the  Army  may  consider 
appropriate  (e.g.,  cheaper,  faster,  a  cost-time  ratio, 
greater  proficiency  after  a  fixed  amount  of  time,  etc.). 

This  evaluation  has  two  major  components*  an  "ac¬ 
quisition"  component,  conceived  as  a  determination  of  the 
time/cost  (efficiency)  of  training  to  overcome  an  initial 
deficit  in  performance  and  to  reach  a  criterion  level  of 
proficiency  on  each  device;  and  a  "transfer"  component, 
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conceived  aa  an  estimation  of  the  remaining  trainee  deficit 
that  muat  be  overcome  in  order  to  demonstrate  a  criterion 
level  of  proficiency  on  the  parent  equipment.  It  ia  impor¬ 
tant  to  keep  in  mind  that  the  "total0  ef fectiveneas  of  a 
device  is  the  aum  of  ab  and  bdj  even  if  AC  ia  leaa  than  AB 
(i.e.,  traineea  will  reach  criterion  on  Device  2  aooner 
than  on  Device  1);  CD  may  atill  be  greater  than  BD  (i.a., 
the  remaining  deficita  are  greater  Device  2).  Thia  could 
occur,  for  example,  if  Device  2  trains  all  the  "eaay" 
parts,  while  Device  2  trains  the  "hard"  parts.  The  totals 
(AB  4-  BD,  AC  +  CD)  are  not  necessarily  highly  correlated 
with  the  acquisition  components. 

Theoretical  Issuesi  Summery 

in  this  chapter  we  have  discussed  a  number  of 
theoretical  issues  related  to  the  evaluation  of  training 
device  effectiveness.  We  have  described  how  either  an  em** 
pirical  or  analytic  assessment  of  effectiveness  can  be  con¬ 
ducted  within  a  program  evaluation  framework  structured 
around  the  concept  of  performance  deficits.  This  approach 
has  the  potential  of  overcoming  several  limitations  found 
in  earlier  forecasting  models.  The  performance  deficit  no¬ 
tion  provides  a  way  of  operationalising  training  importance 
or  criticality  considerations.  The  use  of  explicit 


training  program  evaluation  rationales  provides  a  way  of 
enhancing  the  diagnostic  utility  of  device  evaluation. 
Finally,  the  approach  we  have  described  broadens  the  focua 
of  device  evaluation  to  include  learning  as  well  as  trans¬ 
fer  criteria  and  to  permit  consideration  of  the  influence 
of  extra-device  variables  on  effectiveness.  In  the  next 
chapter,  we  explore  some  of  the  real-world  constraints  on 
developing  and  evaluating  a  training  device  effectiveness 
forecasting  procedure. 
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3.  Practical  and  Methodological  Iaauea 


In  Chapter  1  we  traced  interest  in  formal  analytic 
methods  for  predicting  training  device  effectiveness  back 
to  certain  constraints  associated  with  the  LCSMM  and  the 
acquisition  process.  Tn  Chapter  2  we  explored  a  number  of 
theoretical  issues  in  the  course  of  laying  out  an  analytic 
approach  to  device  design  and  evaluation  that  interrelates 
a  number  of  predictor  and  criterion  variables  within  a 
program  evaluation  framework.  In  this  chapter  we  are  con¬ 
cerned  about  practical  and  methodological  constraints  on 
the  use  and  evaluation  of  the  type  of  forecasting 
procedures  we  have  been  describing.  In  this  connection, 
three  questions  are  paramount.  First,  what  information  is 
needed  to  evaluate  or  estimate  device  effectiveness' 
Second,  what  constraints,  if  any,  does  the  LCSMM  iMpose  on 
the  types  and  levols  of  information  required  to  generate 
predictions  of  effectiveness?  And  third,  once  predictions 
have  been  generated,  how  can  we  validate  them  or  otherwise 
assess  their  quality? 

Issues  Whet  Date  art  Needed  to  Generate  Forecasts? 

Assuming  that  one  wants  to  estimate  device  effective¬ 
ness  using  the  type  of  analytic  procedure  jus':  described, 
then  certain  information  requirements  must  be  satisfied. 


Specifically,  we  need  information  about  the  objectivea  of 
training  and  about  the  independent  variable*  that  dictate 
whether  (how  well)  the  objective*  will  be  achieved. 

Specification  of  objectivea  and  variablea.  Within  the 
context  of  a  training  program  rationale,  it  ia  imperative 
that  the  designer*  and  developer*  of  a  training  device  be 
able  to  describe  the  intermediate  outcome*  they  are  trying 
to  achieve.  Toward  that  end  they  need  to  deacribe  both  the 
operational  performance  objective  fox  the  parent  equipment 
as  well  a*  the  device-mediated  training  objective.  In 
spite  of  the  obviousneaa  of  this  need,  and  realisation  that 
such  statements  are  the  cine  qua  non  of  any  form  of  device 
evaluation  (i.e.,  empirical  or  analytic),  it  is  exceedingly 
difficult  in  practice  to  find  adequate  specif icationa. 
Anyone  who  seriously  doubts  this  assertion  need  only  review 
a  random  sample  of  Training  Device  Requirement  (TDR)  state¬ 
ments  to  realise  how  elusive  adequate  specification  really 
i v.  As  one  would  expect,  the  specifications  are  par¬ 
ticularly  uebulous  during  the  earlier  phases  of  device  ac¬ 
quisition  when  there  is  a  scarcity  of  detailed  information. 

Ideally,  specification  of  the  performance  objective 
should  be  based  on  operational  needs  associated  with  a 
specific  system  and  one  or  more  missions.  When  the  impetus 


for  specification  of  performance  objectives  comes  from  the 
development  of  a  new  system*  the  objectives  should  properly 
be  defined  as  an  integral  part  of  that  system.  When  the 
the  impetus  stems  from  an  observed  deficiency  in  the  ongo¬ 
ing  performance  of  some  mission-related  task*  the  objec¬ 
tives  ought  to  be  specified  as  part  of  the  "statement  of 
need"  that  drives  the  formulation  of  the  training  program. 

Whatever  the  impetus  for  their  specification*  training 
and  performance  objectives  can  and  should  be  explicitly  in¬ 
cluded  in  information  provided  to  (or  developed  by)  poten¬ 
tial  training  device/system/program  designers  and 
evaluators.  They  can  then  be  used  to  derive  criterion 
measures  in  support  of  the  empirical  validation  of  any  ac¬ 
tual  training  approach.  More  importantly  for  present  pur¬ 
poses,  however*  they  can  be  used  as  the  starting  point  for 
an  analytical  model  to  predict  the  impact  of  a  training 
device  before  that  device  has  been  actually  designed  and 
developed. 

As  the  cornerstones  of  empirical  assessments  and 
analytic  evaluations*  specifications  of  performance  and 
training  objectives  must  be  defined  operationally  in  such  a 
manner  that  performance  can  be  reliably  and  unambiguously 
measured  or  otherwise  characterized.  The  operational 
definition  must  specify  at  least  the  following  itemsi 
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•  the  population  of  subjects  to  be  tested; 

e  the  specific  behaviors  to  be  measured; 

e  the  environment  for  testing  (e.g.,  during 
daylight);  and 

e  the  level  of  proficiency  on  the  device  and/or 
the  parent  equipment  designated  as  the 
criterion. 

In  the  case  of  Army  training,  the  criterion  may  be 
stated  as  a  population  statistic,  rather  than  an  individual 
level  of  proficiency.  For  example,  instead  of  specifying 
the  performance  criterion  as  some  individual  scoro  level, 
the  operational  criterion  may  be  that  901  of  trainees  be 
able  to  completo  a  particular  t.a«k  on  the  training  device 
with  no  errors.  By  the  same  token,  specifying  the  training 
or  performance  objective  in  terms  of  a  single  criterion 
level  for  each  task  may  be  unnecessarily  limiting.  Instead 
of  a  "pass-fail"  criterion,  it  may  be  preferable  to  develop 
a  measurement  system  that  discriminates  across  a  range  of 
performance.  The  latter  is  desirable,  as  it  permits  trade¬ 
offs  among  levels  of  performance  on  multiple  objectives, 
and  allows  aggregation  of  scores  into  an  overall 
characterization  of  performance. 


In  addition  to  specifications  of  training  and 
performance  objectives,  we  need  information  regarding 
predictor  variables.  That  is,  information  about  displays, 
controls,  instructional  features,  task  analyses/skill 
analyses,  etc.,  has  to  be  provided  in  sufficient  detail  to 
be  of  use  to  the  device  analyst/evaluator .  In  our  earlier 
discussion  of  forecasting  procedures  we  identified  five 
classes  of  such  variables  Including  trainees,  preliminary 
training,  tasks,  instructional  variables,  and  the  larger 
training  context. 

I  ' 

All  that  we  are  in  fact  suggesting  in  this  and  the 
preceding  discussion  of  objectives  is  that  cartain  data 
mu at  be  available  to  support  analytically  derived  estimates 
of  training  device  effectiveness.  However,  the  required 
data  often  are  not  readily  available.  In  the  next  section, 
we  describe  some  of  the  real-world  issue*  that  constrain 
the  types  and  levels  of  information  about  training  devicss 
and  programs. 

Xasuai  How  does  the  LCSMM  Affect  Device  Evaluation? 

There  have  been  several  recent  reviews  of  training 
device  design  and  development  within  the  Army  system  ac¬ 
quisition  pr  -ess  (e.g.,  Kane  &  Holman,  1982;  Matlick, 
Rosen,  &  Berger,  1980).  In  the  next  few  paragraphs,  we 
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will  briefly  describe  the  major  phases  of  the  training 
device/simulator  acquisition  process. 

During  the  first  or  Evaluation  of  Alternative  System 
Concepts  ( E ASC )  phase,  several  key  decisions  are  made  that 
ultimately  will  influence  design  of  the  training  devices  in 
important  ways.  For  example,  based  on  results  of  an  ini¬ 
tial  Training  Development  Study,  a  Training  Device  Need 
Statement  is  prepared  that  describes  requirements  for 
device-mediated  individual  and  collective  training. 
Alternative  training  concepts  are  than  considered  in  the 
course  of  selecting  a  Best  Technical  Approach  to  meeting 
documented  needs.  These  preliminary  decisions  about  the 
device  and  its  design  are  reflected  in  a  Concept 
Formulation  Package  and  an  Outline  Acquisition  Plan. 

During  the  second  or  Demonstration  and  Validation 
(DVAL)  phase,  the  Outline  Acquisition  Plan  is  updated  and 
used  to  acquire  an  advanced  development  prototype  or  bread¬ 
board  training  device.  It  is  during  this  second  phase  that 
the  breadboard  device  is  used  to  support  a  variety  of  em¬ 
pirical  investigations  comprising  the  Update  Training 
Development  Study  in  which  alternative  training  concepts 
are  assessed  and  the  most  promising  are  validated.  The 
results  serve  to  define  the  Training  Device  Requirement  and 
a  final  Acquisition  Plan. 
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In  the  third  or  Full-scale  Engineering  Development 
(FSED)  phase,  the  Acquisition  Flan  is  implemented  to  obtain 
an  engineering  development  prototype  or  brassboard  training 
device.  At  this  stage  in  the  acquisition  process,  design 
of  the  training  device  has  been  finalized.  Production  runs 
are  imminent.  Assuming  that  the  brassboard  device  success¬ 
fully  passes  various  field  test  evaluations,  the  fourth  or 
Production  phase  of  acquisition  will  begin. 

The  lockstep  nature  of  the  training  device  LCSMM  leads 
,to  a  design  dilemmai  airly  on  in  the  device  design 
process,  there  is  very  little  information  available  about 
the  parent  system  upon  which  design  decisions  can  be  based. 
When  such  information  subsequently  does  become  available, 
it  is  ususlly  too  late  to  act  on  it,  to  base  major  design 
changes  in  the  training  device  upon  it.  in  other  words, 
while  detailed  information  about  the  parent  system  is 
needed  for  training  system  design,  design  of  the  device 
must  be  initiated  before  such  information  materializes  in 
any  detail.  The  consequence  of  this  design  dilemma  is  that 
the  training  device  design  process  is  a  bootstrapping 
operation,  consisting  of  a  series  of  approximations  tied  to 
the  evolving  structure  of  the  parent  system. 
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As  one  example  of  the  dilemma,  training  device 
designers  need,  if  not  detailed  descriptions  of  the  parent 
equipment,  at  least  the  job  descriptions  for  system 
operators.  These  job  descriptions  are  the  source  data  that 
serve  as  input  to  analytic/rational  procedures  (e.g.,  the 
Instructional  Systems  Development  [ISD]  procedures)  for 
determining  how  best  to  design  and  develop  training 
programs.  Typically,  job  descriptions  are  rendered  as  Task 
analyses/Skill  analyses  (TASA) .  However,  such  detailed  in¬ 
formation,  derived  from  analyses  of  the  parent  system,  is 
often  too  late  in  coming  to  be  useful  in  making  early  and 
important  decisions  about  training  concepts  and  device 
design. 

Similarly,  as  we  noted  in  Chapter  1,  there  are  points 
in  the  LCSMM  where  both  empirical  and  analytic  evaluations 
are  supposed  to  occur.  Pot  example,  the  LCSMM  provides  for 
an  empirical  "concept  of  training"  investigation,  a  "bread¬ 
board"  evaluation,  a  "braBsboard"  evaluation,  and 
Operational  Tests  I  and  II.  In  practice,  however,  the 
tight  schedule  of  device  development  and  procurement  usual¬ 
ly  precludes  empirical  evaluations  during  the  design  and 
development  process.  Because  the  training  developers  have 
to  adhere  to  the  faster-paced  materiel  system  acquisition 
schedule,  time  constraints  also  preclude  research  on 
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competing  devices  or  training  conceptions  early  in  the 
acquisition  process.  If  empirical  evaluations  are 
conducted  (e.g.,  OT  II),  they  usually  occur  much  too  late 
to  modify  the  device  design  based  on  the  results. 

Similarly,  while  the  LCSMM  provides  for  analytic  appraisals 
and  review  of  designs  at  numerous  points,  especially  during 
the  earlier  stages  of  development,  such  appraisals,  as  we 
noted  earlier,  are  neither  systematic  nor  formalized. 

Difficulties  in  obtaining  the  right  type  of  informa¬ 
tion  at  the  proper  time  are  exacerbated  by  a  natural  ten¬ 
sion  between  decisions  related  to  instruction  and  simula¬ 
tion.  As  a  training  system  matures,  it  increasingly  con¬ 
sists  of  two  environments!  an  interactive  instructional 
environment,  consisting  of  courseware,  adaptive  training 
features,  etc.,  and  a  simulation  environment,  consisting  of 
those  aspects  of  the  operational  situation  that  are 
represented  in  the  learning  situation.  Training  developers 
have  to  account  for  the  interplay  between  these  two  en¬ 
vironments  during  the  drssign  and  development  of  a  training 
device.  In  practice,  when  one  is  emphasized,  the  other  is 
often  downplayed,  with  a  potential  loss  in  effectiveness. 


Collectively,  these  and  other  constraints  on 
information,  arising  from  the  realities  of  the  training 
device  LCSMM,  have  led  designers  and  procurement  personnel 
to  exhibit  two  "tendencies."  One  is  the  tendency  to 
gravitate  toward  high-fidelity  devices.  This  often  (but 
certainly  not  always)  minimizes  the  "training  system" 
design  component.  The  second  is  the  tendency  to  adopt  a 
"design  to  cost"  decision  rule:  design  or  buy  the  device 
with  the  most  instructional  features  and  the  highest  level 
of  fidelity  that  is  within  budget,  even  though  fewer  fea¬ 
tures  or  lower  fidelity  may  still  produce  effective 
training. 

Where  does  all  of  this  leave  an  analytic  model  that 
predicts  device  effectiveness?  The  first  conclusion  to  be 
drawn  is  that  since  empirical  evaluations  of  effectiveness 
are  generally  infeasible  in  practice,  analytic  methods  must 
be  used.  Second,  we  believe  that  sound  analytic  methods 
would  be  used.  Designers  and  developers  are  forced  by  cir¬ 
cumstances  beyond  their  control  to  make  analytic  assess¬ 
ments,  but  have  few  if  any  analytic  tools  with  which  to 
work.  Good  methods  would  rapidly  find  their  way  to  the  ap¬ 
propriate  audience.  Finally,  these  methods  must  be 
flexible  enough  to  allow  evaluations  to  occur  with  a  wide 
range  of  input  information  --  from  very  general  "training 


concept'*  speculations  early  in  device  acquisition  to  very 
detailed  engineering  specifications  later  on.  The 
challenge  is  to  conceive  of  ways  in  which  estimates  of  ef¬ 
fectiveness  can  be  generated  that  overcome  the  many  con¬ 
straints  we  have  alluded  to. 

Iasuet  How  Can  Forecasts  be  Validated? 

How  would  one  go  about  determining  the  validity  of  a 
device  effectiveness  forecasting  model?  An  obvious  sugges¬ 
tion  is  to  use  empirical  data.  It  is  unfortunate  in  this 
regard  that  opportunities  to  try  out  analytic  models  and  to 
use  the  results  of  empirical  tests  to  revise  the  models  for 
improved  prediction  have  been  extremely  limited.  Tryout 
and  revision  would  require  reliable  measurement  of  both 
predictors  and  criteria.  Practical  constraints  (cost; 
limited  availability  of  devices,  parent  equipment, 
trainees,  and  subject  matter  experts)  have  limited  the 
cases  in  which  both  criterion  and  predictor  measurement 
were  reported  (e.g.,  Wheaton  &  Mirabella,  1972;  Mirabella  & 
Wheaton,  1973;  Wheaton,  Rose,  Fingerman,  &  Leonard,  1976c). 

Part  of  the  measurement  infeasibility  problem  derives 
from  the  explicit  assumption  of  many  analytic  procedures 
that  they  should  be  predicting  transfer  to  operational 
equipment  as  the  index  of  device  effectiveness.  Hence, 
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major  components  of  these  models  (e.g.#  "Commonality#" 
"Similarity#"  etc.)  are  structured  around  comparisons 
between  a  training  device  and  the  operational  equipment, 
it  follows  that  any  evaluation  or  testing  of  such  models 
must  use  parent  equipment  performance  as  the  criterion. 

However#  even  when  criterion  measures  are  defined  more 
broadly  to  include  acquisition  phenomena  and  when  arrange¬ 
ments  can  be  made  to  collect  predictor  and  criterion  data# 
other  problems  persist.  The  most  fundamental  of  these  is 
that  validation  of  forecasting  procedures#  or  research  on 
the  component  variables  and  weightings  underlying  such 
procedures#  invariably  requires  some  form  of  regression 
paradigm. 

Regression  paradigms  in  which  device  features  are  sys¬ 
tematically  varied  end  then  related  to  obtained  (empirical) 
effectiveness  scores  are  at  best  infeasible.  Since  the 
number  of  variations  in  device  or  training  program  features 
is  probably  greater  than  the  number  of  devices#  one  would 
not  have  enough  degrees  of  freedom  to  conduct  a  regression 
analysis.  Furthermore,  there  usually  are  not  sufficient 
numbers  of  alternate  devices  that  will  have  been  produced 
to  allow  for  significant  variability  in  any  criterion 
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To  illustrate  this  problem,  consider  a  hypothetical 
training  system  evaluation  effortt  several  devices  are 
used.  Predictions  of  effectiveness  are  generated  for  each 
device.  Then  the  devices  are  used  in  training  and  transfer 
experiments  and  actual  results  are  compared  to  predicted 
values. 

What  we  might  find  is  that  Device  A,  with  high- 
fidelity  stimuli,  motion  cues,  moderate  response 
similarity,  no  augmented  feedback,  and  no  freeze-frame 
capability  did  slightly  better  than  Device  B,  which  con¬ 
tained  low-fidelity  stimuli,  motion  cues,  high  response 
similarity,  augmented  feedback,  and  no  freeze-frame 
capability,  which  did  much  better  than  Device  C  with 
.  .  .  .  Clearly,  we  have  little  hope  of  untangling  these 
outcomes  to  determine  the  critical  device  dimensions  con¬ 
tributing  to  different  levels  of  effectiveness.  Are  there 
other  approaches  to  evaluating  and  refining  forecasting 
models? 

4  A  possible  approach  to  this  problem  of  insufficient 
numbers  of  alternative  devices  is  being  investigated  by 
ARI .  This  approach  involves  laboratory  experiments  with 
"real"  training  devices,  where  the  experimenter 
artificially  creates  several  versions  of  the  same  device, 
trains  groups  of  subjects  on  each  version,  and  "transfers" 
all  of  the  subjects  to  a  single  "criterion"  version. 
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Alternative  empirical  approaches.  A  different 
approach  to  measuring  effectiveness  is  contained  in  the 
program  evaluation  approach  described  in  the  preceding 
chapter.  The  concept  is  that  if  "ultimate"  objectives  can¬ 
not  be  measured/  the  intermediate  objectives  and  the  links 
between  the  various  objectives  can  be.  For  example/  it  may 
be  relatively  easier  to  measure  acquisition  performance  on 
the  training  device.  These  scores  could  be  used  as 
criterion  data  for  assessment  of  program  features/  such  as 
individual  difference  variables/  user  acceptance  indexes/ 
etc. 

Again*  assuming  that  it  is  not  possible  to  measure 
transfer  to  the  operational  system/  we  may  still  be  able  to 
generate  indirect  or  inductive  support  for  device  effec¬ 
tiveness.  The  argument  is  as  follows!  Transfer  to  a 
specific  operational  task  iat  in  essence/  a  generalization 
phenomenon!  Will  good  performance  in  one  set  of  cir¬ 
cumstances  generalize  to  other  circumstances  (of  which  the 
parent  equipment  is  only  one  example)?  That  isf  will  per¬ 
formance  be  maintained  with  a  variety  of  stimuli/  a  variety 
of  responses/  different  controls/  different  environmental 
circumstances/  etc.?  Evidence  of  generalization  can  be 
used  as  inductive  evidence  for  transfer  to  a  particular 
(i.e./  operational)  situation. 


Thus,  one  could  use  a  series  of  surrogate 
transfer/generalization  situations,  perhaps  including  dif¬ 
ferent  training  device  configurations  and  other  analogous 
equipment,  to  test  the  generalizability  of  acquired  skill 
and  knowledge.  Our  confidence  in  the  effectiveness  of  a 
device  would  increase  with  each  demonstration  of 
generalization  to  a  different  device  configuration. 

In  conjunction  with  alternative  empirical  approaches, 
the  program  evaluation  framework  prescribes  certain 
analytic  and  statistical  methods  that  can  be  used  to 
validate  a  device  effectiveness  forecast  model. 
Specifically,  when  any  analytic  method  is  used  to  generate 
predictions  of  training  effectiveness,  a  number  or  set  of 
numbers  is  produced.  Is  there  anything  that  can  be  done 
with  these  numbers  to  determine  their  potential  usefulness 
without  collecting  actual  performance  data?  In  the  follow¬ 
ing  sections,  we  describe  several  analyses  that  directly  or 
indirectly  may  shed  light  on  the  validity  of  any  proposed 
forecasting  procedure. 

Sensitivity  analyses.  Suppose  we  generate  a  set  of 
numbers  meant  to  represent  the  effectiveness  of  two 
devices.  For  example,  Device  1  is  estimated  to  have  an 
effectiveness  of  0.20  and  Device  2  is  estimated  at  0.25. 


Is  the  difference  between  0.20  and  0.25  “significant , " 
i.e.,  would  we  expect  soldiers  trained  on  one  device  to 
perform  better  than  soldiers  trained  on  the  other?  Or  is 
this  difference  within  the  measurement  error  of  the  estima¬ 
tion  system?  To  answer  these  questions,  it  is  necessary  to 
derive  a  distribution  for  any  predictive  index  that  allows 
statements  about  differences  in  predicted  values. 

one  very  interesting  question  is  “sensitivity": 
whether  or  not  a  set  of  ratings  differs  significantly  from 
that  which  would  be  obtained  by  random  assignment  of 
ratings  to  the  available  scales.  With  a  lack  of  knowledge 
about  distributional  characteristics  of  model  parameters, 
the  assumption  of  uniform  distributions  provides  the  most 
diffuse  values.  Investigation  of  this  problem  also  pin¬ 
points  some  of  the  problems  that  will  surface  in  inves¬ 
tigating  other  potential  distributions. 

Reliability.  The  reliability  of  an  estimate  of  effec¬ 
tiveness  is  determined  by  the  reliabilities  of  its  con- 
atitutents.  That  is,  once  the  reliabilities  of  the  opera¬ 
tional  measures  of  variables  are  determined,  the 
reliability  of  a  measure  of  effectiveness  (which  is  a  com¬ 
bination  of  operational  measures)  may  be  calculated.  For 
simple  combination  rules,  it  may  be  possible  to  determine 
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analytically  the  reliability  of  the  combined  measure.  For 
other,  more  complex  combinatorial  rules,  it  may  be  more 
reasonable  to  determine  the  reliability  by  Monte  Carlo 
simulation. 

One  of  the  most  important  analyses  that  can  take  place 
in  the  evaluation  of  estimates  of  effectiveness  is  the  e%- 
amination  of  the  properties  of  the  rules,  to  determine 
whether  they  are  sensible  and  whether  they  predict  derired 
properties  of  an  effectiveness  measuro.  For  example,  if 
effectiveness  is  «  multiplicative  combination  of  tha  con¬ 
stituent  variables,  one  would  expect  there  to  be  a  zero 
point  for  each  constituent  such  that  effectiveness  would  be 
a  constant  whenever  at  least  one  of  the  constituent 
measures  was  at  the  zero  point.  On  the  other  hand,  addi¬ 
tive  rules  do  not  have  thin  property.  The  properties  of 
any  effectiveness  measure  that  is  a  simple  polynomial  can 
be  examined  by  looking  at  its  additive  and  multiplicative 
components,  In  addition,  properties  of  tha  combination 
rules  at  the  extremes  will  give  an  indication  of  the 
validity  of  the  rules. 

Incremental  validity.  One  standard  method  for  assess¬ 
ing  validity  is  to  compare  the  predictions  of  the 
combination  rules  to  expert  judgments.  The  methods  of 
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conjoint  measurement,  policy  capturing  (using  multiple 
regression) ,  and  functional  measurement  (using  analysis  of 
variance)  can  bo  applied  to  compare  expert  judgments  with 
the  predictions  of  the  model.  These  three  methods  differ 
|  in  basing  their  tests  either  on  ordinal  or  on.  interval 

i 

properties  of  the  data,  and  in  requiring  or  not  requiring  a 
■  balanced  design.  This  evaluation  uses  expert  judges  to 

define  the  reasonableness  of  combination  rules,  and  it  per~ 
forms  an  analysis  similar  in  many  ways  to  the  logical 

I  analysis  of  properties  described  above. 

* 

i 

i 

The  analysis  of  the  history  of  devices  for  which  lon- 
j  gitudinal  archival  data  were  available  would  give  a  further 

j  indication  of  the  validity  of  the  estimate  of  effective* 

;  ness.  For  exampla,  we  would  expect  that  the  effectiveness 

*« 

|  of  a  device  would  increase  as  it  was  modified  and  improved, 

and  as  problems  with  it  were  fixed.  Thus  we  would  expect 
;  our  prediction  of  effectiveness  to  mimic  the  notions  of 

4 

i  device  effectiveness  that  were  being  used  by  the  decision 

makers.  If  it  did,  this  would  argue  for  the  validity  of 
our  predictive  estimate.  In  other  words,  if  the  predicted 
j  score  increased  as  the  device  became  more  highly  developed, 

we  would  expect  the  validity  of  the  estimate  to  be 
strengthened. 

*> 

I 
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There  is  another  way  that  we  may  obtain  information 
relevant  to  the  validity  of  the  estimate  of  effectiveness, 
again  from  an  historical  analysis  of  decisions  made  during 
the  development  of  the  devicet  Basically,  at  any  stage  in 
the  process,  development  of  a  device  may  be  continued  or  it 
may  be  stopped.  At  earlier  stages  in  the  acquisition 
cycle,  development  of  a  device  may  continue  either  if  the 
design  is  promising  or  to  obtain  more  information  regarding 
its  estimated  effectiveness.  It  would  be  expected  that  at 
any  stage,  the  decision  to  continue  --  that  is,  the  deci¬ 
sion  to  "purchase"  more  information  about  the  device  — 
would  be  related  to  the  measurement  of  effectiveness.  As 
was  pointed  out  above,  the  validity  of  the  predicted  es¬ 
timate  of  effectiveness  would  be  expected  to  increase  for 
devices  in  later  stages  of  development.  If  we  assume  that 
the  decision  makes  is  (or  should  be)  considering  this,  we 
can  compare  our  estimate  to  the  history  of  these  decisions. 
Ultimately,  it  may  be  possible  to  model  these  information¬ 
purchasing  decisions  to  aid  the  decision  maker  further. 

Discriminability .  The  discr iminability  of  an  ag¬ 
gregate  measure  of  effectiveness  depends  on  the  aggregation 
rule  and  on  the  joint  distribution  of  values  of  the  in¬ 
dividual  constituents  of  the  effectiveness  measure.  For 
example,  if  the  combination  rule  is  additive  and 
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constituents  are,  in  general,  negatively  correlated,  the 
aggregate  measure  will  not  discriminate  among  devices. 
Consequently,  the  weights  that  are  used  in  the  effective¬ 
ness  model  will  have  a  great  effect  on  the  relative 
measures  of  the  effectiveness  of  two  devices.  Since  nega¬ 
tive  correlations  may  be  the  product  of  the  tradeoffs  that 
the  designer  of  the  device  makes  to  arrive  at  a  product 
with  a  reasonable  cost,  it  is  likely  that  the  effectiveness 
scale  will  have  low  discr iminability. 


One  way  to  investigate  the  discr iminability  of  the 
measure  is  to  compare  actual  devices  known  to  differ  in  ef- 
|  fectiveness.  This  comparison  gives  an  indication  of  the 

ability  of  the  measure  to  detect  large  differences  in  ef- 

i 

fectiveness.  Another  way  to  investigate  the  dis- 
cr iminability  of  the  predicted  effectiveness  measure  is  to 
conduct  Monte  Carlo  simulations  in  which  hypothetical 

i 

'  devices  are  evaluated.  The  distributions  of  the  scores  on 

i 

I  the  constituent  variables  are  varied;  for  some  cases,  the 

variables  positively  correlated;  for  others,  the  variables 

\ 

independent  or  negatively  correlated.  Finally,  distribu- 

ji 

j  tional  properties  of  the  overall  measures  can  be  examined. 

* 
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Efficiency.  The  bait  measure  of  "effort"  in 
determining  the  efficiency  of  a  measure  is  the  number  of 
constituent  variables  that  make  up  the  aggregate  measure. 
The  actual  form  of  the  combination  rule  is  probably  unim¬ 
portant  in  assessing  effort.  Thus,  validity/numbor  of  con¬ 
stituents  ie  a  reasonable  measurement  of  efficiency  in  this 
measure,  just  as  error-reduction/degrees  of  freedom  is  a 
reasonable  method  of  testing  models  in  the  analysis  of 
variance.  In  this  sense,  efficiency  is  a  measure  of  the 
parsimony  of  the  model.  A  measure  of  efficiency  which  in¬ 
cludes  a  large  number  of  variables  requires  great  "effort" 
and  is  unparslmonious. 

Simplicity.  The  lack  of  an  effectiveness  criterion 
requires  in  most  cases  that  the  model  with  the  most  para¬ 
meters  be  taken  as  the  criterion.  A  critical  question  to 
ask  is  whether  some  smaller  set  (which  presumably  could  be 
more  reliably  and  efficiently  obtained)  could  produce  the 
same  predictions.  This  would  obviate  the  necessity  for 
cumbersome  and  potentially  unreliable  calculations  and 
judgments.  If  we  consider  the  predictions  of  the  most  com¬ 
plex  model  as  a  criterion,  we  could  use  stepwise  regression 
techniques  to  determine  the  relative  ability  of  simpler 
models  to  give  the  same  results  as  the  most  complex  model. 
In  addition,  using  standard  statistical  tests,  we  could 


compare  different  (and  perhaps  simpler)  functional  forms 
for  the  effectiveness  measure  with  the  most  complex  (and 
presumably  most  accurate)  measure.  For  example,  the  ratio 
of  goodness-of-f it  measures  could  be  compared  using  an 
F-test. 


Care  should  be  taken,  however,  in  considering  these 
simplicity  analyses.  While  simplicity  is  an  important  vir- 
ture  for  this  particular  use  of  the  model  (i.e.,  generating 
a  single  measure  of  "predicted  effectiveness"),  it  may  not 
be  desirable  for  other  uses  of  the  model,  such  as  diagnos¬ 
tic  power. 


Practical  and  Methodological  issues:  Summary 


To  be  maximally  useful,  any  model  must  bo  sensitive  to 
variations  in  the  quality  and  quantity  of  input  informa¬ 
tion.  For  decisions  early  in  the  LCSMM,  not  much  more  than 
general  "function"  statements  are  available  regarding  task 
and  training  demands.  There  are  insufficient  data  to  con¬ 
duct  all  but  the  most  general  types  of  analyses  and  to  make 
only  the  grossest  of  decisions  regarding  training  device 
(or  system)  concepts.  As  more  data  become  available  -- 
both  about  the  operational  task  and  equipment,  and  about 
the  proposed  training  system  --  more  detailed  judgments  and 
estimates  of  effectiveness  an  be  made. 
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Thus,  the  practical  constraints  of  the  LCSMM  require 
that  an  effectiveness  evaluation  model  be  capable  of 
generating  predictions  with  both  general  and  detailed  in¬ 
puts.  Similarly,  and  perhaps  more  importantly,  models 
should  be  capable  of  providing  diagnostic  information  — 
why  the  design  concept  is  judged  ineffective,  how  a  design 
concept  could  be  Improved  --  at  all  stages  of  development. 

There  also  are  practical  constraints  on  the  evaluation 
of  a  device  effectiveness  forecasting  system.  One  approach 
is  bo  conduct  the  required  empirical  tests  when  feasible. 
When  infeasible,  other  less  direct  assessments  may  be 
required.  These  must  be  designed  to  accumulate  presumptive 
evidence  for  the  validity  of  the  forecasting  models.  It  is 
essential  that  development  and  evaluation  of  these  models 
continue,  despite  these  practical  obstacles.  In  this  chap¬ 
ter,  we  have  suggested  several  directions  in  which  to 
proceed. 
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Appendix  A:  Indexes  of  Transfer  and  Theoretical  Bases 

There  are  several  commonly  used  indexes  of  transfer. 
For  example,  it  is  possible  to  express  the  amount  of  trans¬ 
fer  between  a  training  device  and  the  parent  operational 
equipment  relative  to  the  performance  of  an  untrained  con¬ 
trol  group  of  soldiers  on  the  parent  equipment  (e.g,, 

Gagne,  Foster,  &  Crowley,  1948)! 

Percentage  of  Transfer  ■  [  (E  -  C)  /  C)  X  100. 

In  this  formulation,  E  refers  to  the  performance  of  the  ex¬ 
perimental  group  of  soldiers  on  the  parent  equipment  fol¬ 
lowing  training  on  the  training  device,  and  C  refers  to  the 
performance  of  the  control  group  of  soldiers  on  the  parent 
equipment,  not  having  been  trained  on  the  training  device. 

Another  commonly  used  index  is  to  compare  the  obtained 
transfer  with  the  "maximum  oossible  value"  (Murdock,  1957). 
The  maximum  possible  value  is  the  best  score  hypothetically 
attainable  on  the  parent  equipment: 

Percentage  of  Transfer  ■  [ (B  -  C)  /  (T  -  C)  )  X  100, 
where  T  is  the  maximum  possible  score. 


A  third  index  expresses  transfer  as  th*  ratio  of  the 
difference  between  the  experimental  and  control  score*  to 
the  sum  of  these  scores  (e.g.,  Murdock,  1957) t 

Percentage  of  Transfer  ■  [ (E  -  C)  /  (E  +  C) ]  X  100. 

All  of  the  above  formulations  can  be  applied  equally 
well  to  first-trial  or  "cumulative*1  (i.e.,  summative)  per¬ 
formance.  However,  more  elaborate  indexes  of  transfer  are 
necessary  when  learning  rates  are  considered  (e.g.,  Roscoe, 
1971;  1972)  .  The  skill  acquisition  curve  for  the  opera¬ 
tional  task  on  the  parent  equipment  must  be  described  by  at 
least  two  parameters!  the  performance  level  at  the  begin¬ 
ning  of  practice  (i.e.,  "initial  transfer")  and  the  rate  of 
change  in  performance  across  practice.  It  is  entirely  pos¬ 
sible  that  different  characterizations  of  device  effective¬ 
ness  might  be  associated  with  these  two  parameters.  For 
example,  Hammerton  (1963),  using  an  airplane  simulator, 
found  initial  negative  transfer,  but  positive  long-term 
transfer  (i.e.,  total  "savings"  on  time  to  criterion  on  the 
operational  task)  . 

Just  as  there  are  several  popular  empirical  indexes  of 
transfer,  there  also  are  different  perspectives  about  its 
theoretical  underpinnings.  The  theoretical  bases  of  the 
transfer  phenomenon  have  a  long  history  in  applied 


psychology,  dating  back  to  Thorndike  (e.g.,  Thorndike  & 
Woodsworth,  1901;  Thorndike,  1903).  Ke  proposed  a  theory 
of  "identical  elements,"  claiming  that  there  would  be  posi¬ 
tive  transfer  in  the  learning  of  a  second  task  to  the  ex¬ 
tent  that  that  task  required  components  learned  in  some 
other  task.  In  this  view,  transfer  was  quite  specific. 
Facilitation  of  performance  on  the  new  task  would  not  occur 
unless  at  least  part  of  the  new  task  consisted  of  "ele¬ 
ments"  specifically  learned  in  the  first  task. 

More  commonly,  transfer  is  formulated  in  stimulus- 
response  terminology,  with  the  Osgood  (1949)  transfer  sur¬ 
face  as  the  principal  exemplar!  the  amount  and  direction 
of  transfer  vary  as  a  function  of  stimulus  and  response 
similarity  between  two  tasks.  According  to  the  Osgood  sur¬ 
face,  whan  the  stimuli  for  two  tasks  are  identical  but  the 
responses  are  completely  unrelated,  maximum  negative  trans¬ 
fer  theoretically  will  occur.  Maximum  positive  transfer  is 
expected  whan  both  stimuli  and  responses  are  identical  for 
the  two  tasks. 

In  current  cognitive  psychological  terminology,  trans¬ 
fer  depends  on  the  modification  of  pre-existing  knowledge 
structures  ("schemas")  by  training  so  that  new  information 
(e.g.,  about  the  next  task  to  be  learned)  can  be 


efficiently  incorporated  (e.g.,  Neisser,  1976).  Transfer 
will  occur  when,  during  practice  on  an  initial  task,  new 
information  is  added  to  existing  knowledge  bases  that 
trainees  can  apply  to  the  second  or  new  task. 

We  also  can  consider  the  transfer  paradigm  as  a 
strategy  selection  situation  (e.g.,  Gibson  &  Gibson,  1955). 
When  faced  with  a  new  task,  people  apply  previously  learned 
strategies.  The  selection  of  a  particular  strategy  depends 
upon  the  perceived  degree  of  similarity  between  the  new 
situation  and  whatever  the  performer  has  previously  learn¬ 
ed.  If  the  circumstances  or  context  of  the  new  task  is 
similar  to  that  of  the  previously  learned  task,  trainees 
will  try  the  strategies  that  were  previously  successful. 
Postive  transfer  will  occur  if  these  strategies  are  "ap¬ 
propriate";  no  transfer  or  even  negative  transfer  will  oc¬ 
cur  if  the  perceived  similarity  leads  to  the  selection  of 
— propria te  strategies  —  that  is,  the  trainee  perceives 
(and  acts  on)  a  similarity  when  none  exists. 


